2022-12-01T10:12:06.0911497Z Requested labels: linux.8xlarge.nvidia.gpu 2022-12-01T10:12:06.0911570Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/pull/89997/merge 2022-12-01T10:12:06.0911593Z Waiting for a runner to pick up this job... 2022-12-01T10:12:06.5992678Z Job is about to start running on the runner: i-0eaaa5984457e9076 (organization) 2022-12-01T10:12:11.4107612Z Current runner version: '2.299.1' 2022-12-01T10:12:11.4115628Z Runner name: 'i-0eaaa5984457e9076' 2022-12-01T10:12:11.4116342Z Runner group name: 'Default' 2022-12-01T10:12:11.4117166Z Machine name: 'ip-10-0-0-161' 2022-12-01T10:12:11.4120078Z ##[group]GITHUB_TOKEN Permissions 2022-12-01T10:12:11.4121069Z Actions: read 2022-12-01T10:12:11.4121518Z Checks: read 2022-12-01T10:12:11.4122021Z Contents: read 2022-12-01T10:12:11.4122828Z Deployments: read 2022-12-01T10:12:11.4123265Z Discussions: read 2022-12-01T10:12:11.4123700Z Issues: read 2022-12-01T10:12:11.4124196Z Metadata: read 2022-12-01T10:12:11.4124578Z Packages: read 2022-12-01T10:12:11.4125025Z Pages: read 2022-12-01T10:12:11.4125469Z PullRequests: read 2022-12-01T10:12:11.4125913Z RepositoryProjects: read 2022-12-01T10:12:11.4126440Z SecurityEvents: read 2022-12-01T10:12:11.4126951Z Statuses: read 2022-12-01T10:12:11.4127336Z ##[endgroup] 2022-12-01T10:12:11.4131772Z Secret source: None 2022-12-01T10:12:11.4132750Z Prepare workflow directory 2022-12-01T10:12:11.7435025Z Prepare all required actions 2022-12-01T10:12:11.7669120Z Getting action download info 2022-12-01T10:12:11.9958657Z Download action repository 'pytorch/pytorch@master' (SHA:850b53bbee82fb194af85b566aedee94b96def32) 2022-12-01T10:12:15.4442929Z Download action repository 'pytorch/test-infra@main' (SHA:1f415583bdcd967e33ea8fd05be71ed0bdf19880) 2022-12-01T10:12:15.6920033Z Download action repository 'nick-fields/retry@7d4a37704547a311dbb66ebdf5b23ec19374a767' (SHA:7d4a37704547a311dbb66ebdf5b23ec19374a767) 2022-12-01T10:12:15.8246193Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2022-12-01T10:12:16.1608493Z Getting action download info 2022-12-01T10:12:16.3471934Z Download action repository 'malfet/checkout@silent-checkout' (SHA:c7b8fef48edfe1bca0044a44b1f7f7c4318a3076) 2022-12-01T10:12:16.5714051Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml 2022-12-01T10:12:16.5716693Z ##[group] Inputs 2022-12-01T10:12:16.5717105Z build-environment: linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T10:12:16.5718327Z test-matrix: { include: [ { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "distributed", shard: 1, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "distributed", shard: 2, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "distributed", shard: 3, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, ]} 2022-12-01T10:12:16.5719600Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:12:16.5720073Z sync-tag: 2022-12-01T10:12:16.5720320Z ##[endgroup] 2022-12-01T10:12:16.5721415Z Complete job name: linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 3, 3, linux.8xlarge.nvidia.gpu) 2022-12-01T10:12:16.6828969Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master 2022-12-01T10:12:16.6829382Z with: 2022-12-01T10:12:16.6829647Z submodules: recursive 2022-12-01T10:12:16.6829889Z fetch-depth: 0 2022-12-01T10:12:16.6830121Z env: 2022-12-01T10:12:16.6830365Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:12:16.6830612Z ##[endgroup] 2022-12-01T10:12:16.7144039Z ##[group]Run retry () { 2022-12-01T10:12:16.7144371Z retry () { 2022-12-01T10:12:16.7144667Z  $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) 2022-12-01T10:12:16.7144959Z } 2022-12-01T10:12:16.7145217Z echo "${GITHUB_WORKSPACE}" 2022-12-01T10:12:16.7145497Z if [ -z "${NO_SUDO}" ]; then 2022-12-01T10:12:16.7145808Z  retry sudo rm -rf "${GITHUB_WORKSPACE}" 2022-12-01T10:12:16.7146314Z else 2022-12-01T10:12:16.7146572Z  retry rm -rf "${GITHUB_WORKSPACE}" 2022-12-01T10:12:16.7146863Z fi 2022-12-01T10:12:16.7147189Z mkdir "${GITHUB_WORKSPACE}" 2022-12-01T10:12:16.7166489Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:12:16.7166808Z env: 2022-12-01T10:12:16.7167059Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:12:16.7167315Z NO_SUDO: 2022-12-01T10:12:16.7167536Z ##[endgroup] 2022-12-01T10:12:16.7413886Z /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-12-01T10:12:19.7346875Z ##[group]Run malfet/checkout@silent-checkout 2022-12-01T10:12:19.7347273Z with: 2022-12-01T10:12:19.7347598Z ref: c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T10:12:19.7347930Z fetch-depth: 0 2022-12-01T10:12:19.7348209Z submodules: recursive 2022-12-01T10:12:19.7348512Z quiet-checkout: true 2022-12-01T10:12:19.7348832Z repository: pytorch/pytorch 2022-12-01T10:12:19.7349342Z token: *** 2022-12-01T10:12:19.7349656Z ssh-strict: true 2022-12-01T10:12:19.7349967Z persist-credentials: true 2022-12-01T10:12:19.7350280Z clean: true 2022-12-01T10:12:19.7350543Z lfs: false 2022-12-01T10:12:19.7350836Z set-safe-directory: true 2022-12-01T10:12:19.7351128Z env: 2022-12-01T10:12:19.7351389Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:12:19.7351690Z ##[endgroup] 2022-12-01T10:12:19.8971601Z Syncing repository: pytorch/pytorch 2022-12-01T10:12:19.8973561Z ##[group]Getting Git version info 2022-12-01T10:12:19.8974148Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2022-12-01T10:12:19.8974816Z [command]/usr/bin/git version 2022-12-01T10:12:19.8975109Z git version 2.37.1 2022-12-01T10:12:19.8983008Z ##[endgroup] 2022-12-01T10:12:19.9003215Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/cc84584b-b0e5-45f1-93df-2a2f5ee323b4' before making global git config changes 2022-12-01T10:12:19.9004475Z Adding repository directory to the temporary git global config as a safe directory 2022-12-01T10:12:19.9010572Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-12-01T10:12:19.9056522Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2022-12-01T10:12:19.9063129Z ##[group]Initializing the repository 2022-12-01T10:12:19.9066819Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-12-01T10:12:19.9099723Z hint: Using 'master' as the name for the initial branch. This default branch name 2022-12-01T10:12:19.9100359Z hint: is subject to change. To configure the initial branch name to use in all 2022-12-01T10:12:19.9101058Z hint: of your new repositories, which will suppress this warning, call: 2022-12-01T10:12:19.9101422Z hint: 2022-12-01T10:12:19.9101824Z hint: git config --global init.defaultBranch 2022-12-01T10:12:19.9102161Z hint: 2022-12-01T10:12:19.9102587Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2022-12-01T10:12:19.9103144Z hint: 'development'. The just-created branch can be renamed via this command: 2022-12-01T10:12:19.9103487Z hint: 2022-12-01T10:12:19.9103991Z hint: git branch -m 2022-12-01T10:12:19.9104571Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2022-12-01T10:12:19.9114360Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2022-12-01T10:12:19.9150414Z ##[endgroup] 2022-12-01T10:12:19.9150996Z ##[group]Disabling automatic garbage collection 2022-12-01T10:12:19.9154930Z [command]/usr/bin/git config --local gc.auto 0 2022-12-01T10:12:19.9187894Z ##[endgroup] 2022-12-01T10:12:19.9188666Z ##[group]Setting up auth 2022-12-01T10:12:19.9197942Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2022-12-01T10:12:19.9234027Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2022-12-01T10:12:19.9535687Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2022-12-01T10:12:19.9567305Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2022-12-01T10:12:19.9867765Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2022-12-01T10:12:19.9913884Z ##[endgroup] 2022-12-01T10:12:19.9914442Z ##[group]Fetching the repository 2022-12-01T10:12:19.9923175Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2022-12-01T10:13:13.2870972Z [command]/usr/bin/git rev-parse --verify --quiet c13d400bffe90e16b96520bbc8a41a6f0c9cd584^{object} 2022-12-01T10:13:13.2914771Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --quiet --no-recurse-submodules origin c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T10:13:14.6166588Z ##[endgroup] 2022-12-01T10:13:14.6167359Z ##[group]Determining the checkout info 2022-12-01T10:13:14.6168620Z ##[endgroup] 2022-12-01T10:13:14.6169106Z ##[group]Checking out the ref 2022-12-01T10:13:14.6174132Z [command]/usr/bin/git checkout --quiet --force c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T10:13:16.3119629Z ##[endgroup] 2022-12-01T10:13:16.3120256Z ##[group]Setting up auth for fetching submodules 2022-12-01T10:13:16.3126380Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2022-12-01T10:13:16.3219990Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2022-12-01T10:13:16.3254857Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2022-12-01T10:13:16.3289493Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2022-12-01T10:13:16.3318980Z ##[endgroup] 2022-12-01T10:13:16.3319491Z ##[group]Fetching submodules 2022-12-01T10:13:16.3324516Z [command]/usr/bin/git submodule sync --recursive 2022-12-01T10:13:16.3653889Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2022-12-01T10:13:16.3967352Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2022-12-01T10:13:16.3969472Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2022-12-01T10:13:16.3972573Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2022-12-01T10:13:16.3975787Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2022-12-01T10:13:16.3979176Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK' 2022-12-01T10:13:16.3983055Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2022-12-01T10:13:16.3986476Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2022-12-01T10:13:16.3990279Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2022-12-01T10:13:16.3994200Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2022-12-01T10:13:16.3998282Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub' 2022-12-01T10:13:16.4002862Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2022-12-01T10:13:16.4008977Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2022-12-01T10:13:16.4013305Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen' 2022-12-01T10:13:16.4017844Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2022-12-01T10:13:16.4022750Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2022-12-01T10:13:16.4027477Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2022-12-01T10:13:16.4032476Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi' 2022-12-01T10:13:16.4037632Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2022-12-01T10:13:16.4043353Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo' 2022-12-01T10:13:16.4049074Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2022-12-01T10:13:16.4054408Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2022-12-01T10:13:16.4060244Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake' 2022-12-01T10:13:16.4065967Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2022-12-01T10:13:16.4071959Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2022-12-01T10:13:16.4077979Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl' 2022-12-01T10:13:16.4084763Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse' 2022-12-01T10:13:16.4090963Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2022-12-01T10:13:16.4097338Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2022-12-01T10:13:16.4103946Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt' 2022-12-01T10:13:16.4110628Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2022-12-01T10:13:16.4117407Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2022-12-01T10:13:16.4124973Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2022-12-01T10:13:16.4131986Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2022-12-01T10:13:16.4139176Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2022-12-01T10:13:16.4146500Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum' 2022-12-01T10:13:16.4153944Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2022-12-01T10:13:16.4161518Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six' 2022-12-01T10:13:16.4169945Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2022-12-01T10:13:16.4177734Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb' 2022-12-01T10:13:16.4185812Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2022-12-01T10:13:16.4194073Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd' 2022-12-01T10:13:16.4223171Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2022-12-01T10:13:16.7748298Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2022-12-01T10:13:17.0370456Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2022-12-01T10:13:17.2617797Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2022-12-01T10:13:17.5855310Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'... 2022-12-01T10:13:17.8785031Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2022-12-01T10:13:19.9609992Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2022-12-01T10:13:25.5200878Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2022-12-01T10:13:25.9541113Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2022-12-01T10:13:26.5395159Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'... 2022-12-01T10:13:28.1618736Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2022-12-01T10:13:29.5200257Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2022-12-01T10:13:31.2013526Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'... 2022-12-01T10:13:38.1092464Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2022-12-01T10:13:38.9638782Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2022-12-01T10:13:40.3273258Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2022-12-01T10:13:41.4810720Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'... 2022-12-01T10:13:41.6939264Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2022-12-01T10:13:42.2416425Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2022-12-01T10:13:42.8256875Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2022-12-01T10:13:43.9111924Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2022-12-01T10:13:44.3517759Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'... 2022-12-01T10:13:44.5490674Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2022-12-01T10:13:44.8347951Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2022-12-01T10:13:48.2896178Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'... 2022-12-01T10:13:48.7654176Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'... 2022-12-01T10:13:49.1612900Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2022-12-01T10:13:55.4452591Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2022-12-01T10:13:57.0471266Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'... 2022-12-01T10:13:57.5118888Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2022-12-01T10:13:57.7816734Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2022-12-01T10:14:03.9372402Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2022-12-01T10:14:04.1588703Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2022-12-01T10:14:04.4310508Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2022-12-01T10:14:05.3765571Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'... 2022-12-01T10:14:05.6220254Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2022-12-01T10:14:06.0103071Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'... 2022-12-01T10:14:06.3478533Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2022-12-01T10:14:06.9500837Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'... 2022-12-01T10:14:09.5458540Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2022-12-01T10:14:10.0796494Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'... 2022-12-01T10:14:12.3572887Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2022-12-01T10:14:12.3702723Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2022-12-01T10:14:12.3803934Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2022-12-01T10:14:12.4093206Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2022-12-01T10:14:12.4372221Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c' 2022-12-01T10:14:12.4826623Z Submodule path 'third_party/VulkanMemoryAllocator': checked out 'a6bfc237255a6bac1513f7c1ebde6d8aed6b5191' 2022-12-01T10:14:13.2810680Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba' 2022-12-01T10:14:13.3064929Z Submodule path 'third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2022-12-01T10:14:13.4281650Z Submodule path 'third_party/cpuinfo': checked out '8ec7bd91ad0470e61cf38f618cc1f270dede599c' 2022-12-01T10:14:13.4694308Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4' 2022-12-01T10:14:13.8271062Z Submodule path 'third_party/cudnn_frontend': checked out '171a7a986f7fbd9ed71bd0cf3c7ad4f55843d6b3' 2022-12-01T10:14:14.3595458Z Submodule path 'third_party/cutlass': checked out 'b72cbf957df8cf84a6d0ff91c190ad51a9c1d24a' 2022-12-01T10:14:14.6616667Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c' 2022-12-01T10:14:14.7180053Z Submodule path 'third_party/fbgemm': checked out '0d98c261561524cce92e37fe307ea6596664309a' 2022-12-01T10:14:14.7198315Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit' 2022-12-01T10:14:14.7201419Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T10:14:14.7205443Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest' 2022-12-01T10:14:14.7209032Z Submodule 'third_party/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T10:14:14.7236113Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'... 2022-12-01T10:14:15.6867931Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'... 2022-12-01T10:14:16.2772989Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'... 2022-12-01T10:14:17.3218583Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/hipify_torch'... 2022-12-01T10:14:17.6826055Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out 'd3fbf7c9bc7c1d1365a94a45614b91c5a3706b81' 2022-12-01T10:14:17.8063184Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3' 2022-12-01T10:14:17.8783158Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796' 2022-12-01T10:14:17.8898060Z Submodule path 'third_party/fbgemm/third_party/hipify_torch': checked out '1840658c184f3eeba787dae0f06c45756c1daaf5' 2022-12-01T10:14:17.9951200Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa' 2022-12-01T10:14:18.0383103Z Submodule path 'third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2022-12-01T10:14:18.0485948Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad' 2022-12-01T10:14:18.0970734Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2022-12-01T10:14:18.1262218Z Submodule path 'third_party/gloo': checked out '5b143513263133af2b95547e97c07cebeb72bf72' 2022-12-01T10:14:18.1826439Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2022-12-01T10:14:18.1957424Z Submodule path 'third_party/ideep': checked out '77d662b313a762e82b389d3fd965e0098f12cd99' 2022-12-01T10:14:18.1975312Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2022-12-01T10:14:18.2002039Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2022-12-01T10:14:26.7604315Z Submodule path 'third_party/ideep/mkl-dnn': checked out '888a87a954e4fddb4d81fd10858eb834f2441b46' 2022-12-01T10:14:26.7623825Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T10:14:26.7652272Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'... 2022-12-01T10:14:35.2693111Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out '52b5f107dd9cf10910aaa19cb47f3abf9b349815' 2022-12-01T10:14:35.2812395Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724' 2022-12-01T10:14:35.2983403Z Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42' 2022-12-01T10:14:35.4133515Z Submodule path 'third_party/kineto': checked out '0703c78999061b8329dfab7ec5046fc5764a5573' 2022-12-01T10:14:35.4152024Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T10:14:35.4155290Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T10:14:35.4182540Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2022-12-01T10:14:36.5880067Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2022-12-01T10:14:37.7443164Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd' 2022-12-01T10:14:37.8107501Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2022-12-01T10:14:37.8353269Z Submodule path 'third_party/nccl/nccl': checked out 'f89fd4777d2ef9229c039ff750ae21da01626f52' 2022-12-01T10:14:37.8516062Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a' 2022-12-01T10:14:37.9864334Z Submodule path 'third_party/nlohmann': checked out '87cda1d6646592ac5866dc703c8e1839046a6806' 2022-12-01T10:14:38.3093754Z Submodule path 'third_party/onnx': checked out 'f7ee1ac60d06abe8e26c9b6bbe1e3db5286b614b' 2022-12-01T10:14:38.3126434Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark' 2022-12-01T10:14:38.3129740Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2022-12-01T10:14:38.3157519Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'... 2022-12-01T10:14:38.7374252Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2022-12-01T10:14:39.6126099Z Submodule path 'third_party/onnx/third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415' 2022-12-01T10:14:39.6512825Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'ffa346860b306c9bbfb341aed9c14c067751feb8' 2022-12-01T10:14:39.6695619Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f' 2022-12-01T10:14:39.6712811Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T10:14:39.6739746Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'... 2022-12-01T10:14:41.4888750Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' 2022-12-01T10:14:41.4911745Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T10:14:41.4914981Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T10:14:41.4943770Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'... 2022-12-01T10:14:41.9157884Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'... 2022-12-01T10:14:42.8368528Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2022-12-01T10:14:42.9258808Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' 2022-12-01T10:14:42.9266350Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T10:14:42.9293455Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... 2022-12-01T10:14:43.1695829Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2022-12-01T10:14:43.1800709Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a' 2022-12-01T10:14:43.5067505Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2022-12-01T10:14:43.5091600Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2022-12-01T10:14:43.5094530Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2022-12-01T10:14:43.5121999Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2022-12-01T10:14:43.9658942Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2022-12-01T10:14:45.8345964Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2022-12-01T10:14:45.9183367Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2022-12-01T10:14:45.9280172Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2022-12-01T10:14:45.9407927Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413' 2022-12-01T10:14:45.9817991Z Submodule path 'third_party/pybind11': checked out 'aa304c9c7d725ffb9d10af08a3b34cb372307020' 2022-12-01T10:14:45.9918422Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7' 2022-12-01T10:14:46.0260760Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2022-12-01T10:14:46.0369334Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a' 2022-12-01T10:14:46.0917140Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff' 2022-12-01T10:14:46.2290951Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9' 2022-12-01T10:14:46.2614800Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e' 2022-12-01T10:14:46.2631754Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2022-12-01T10:14:46.2634947Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2022-12-01T10:14:46.2638406Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2022-12-01T10:14:46.2641821Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T10:14:46.2670529Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2022-12-01T10:14:47.3105835Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2022-12-01T10:14:47.5908598Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2022-12-01T10:14:48.9519091Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2022-12-01T10:14:49.8709339Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2022-12-01T10:14:49.8880356Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2022-12-01T10:14:49.9686883Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242' 2022-12-01T10:14:50.0017470Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2022-12-01T10:14:50.0034535Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T10:14:50.0061900Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2022-12-01T10:14:50.2602183Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2022-12-01T10:14:50.4226945Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8' 2022-12-01T10:14:50.4261193Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2022-12-01T10:14:50.4613352Z Entering 'android/libs/fbjni' 2022-12-01T10:14:50.4658591Z Entering 'third_party/FP16' 2022-12-01T10:14:50.4704813Z Entering 'third_party/FXdiv' 2022-12-01T10:14:50.4752825Z Entering 'third_party/NNPACK' 2022-12-01T10:14:50.4799470Z Entering 'third_party/QNNPACK' 2022-12-01T10:14:50.4847727Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T10:14:50.4894937Z Entering 'third_party/XNNPACK' 2022-12-01T10:14:50.4952150Z Entering 'third_party/benchmark' 2022-12-01T10:14:50.4998321Z Entering 'third_party/cpuinfo' 2022-12-01T10:14:50.5046277Z Entering 'third_party/cub' 2022-12-01T10:14:50.5090433Z Entering 'third_party/cudnn_frontend' 2022-12-01T10:14:50.5140448Z Entering 'third_party/cutlass' 2022-12-01T10:14:50.5192218Z Entering 'third_party/eigen' 2022-12-01T10:14:50.5237771Z Entering 'third_party/fbgemm' 2022-12-01T10:14:50.5283145Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T10:14:50.5327583Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T10:14:50.5374277Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T10:14:50.5419202Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T10:14:50.5464036Z Entering 'third_party/flatbuffers' 2022-12-01T10:14:50.5509959Z Entering 'third_party/fmt' 2022-12-01T10:14:50.5553414Z Entering 'third_party/foxi' 2022-12-01T10:14:50.5596806Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T10:14:50.5640440Z Entering 'third_party/gloo' 2022-12-01T10:14:50.5685613Z Entering 'third_party/googletest' 2022-12-01T10:14:50.5730162Z Entering 'third_party/ideep' 2022-12-01T10:14:50.5773676Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T10:14:50.5818561Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T10:14:50.5869196Z Entering 'third_party/ios-cmake' 2022-12-01T10:14:50.5913191Z Entering 'third_party/ittapi' 2022-12-01T10:14:50.5957939Z Entering 'third_party/kineto' 2022-12-01T10:14:50.6003861Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T10:14:50.6050553Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T10:14:50.6098239Z Entering 'third_party/nccl/nccl' 2022-12-01T10:14:50.6144260Z Entering 'third_party/neon2sse' 2022-12-01T10:14:50.6188740Z Entering 'third_party/nlohmann' 2022-12-01T10:14:50.6234818Z Entering 'third_party/onnx' 2022-12-01T10:14:50.6296043Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T10:14:50.6342386Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T10:14:50.6392039Z Entering 'third_party/onnx-tensorrt' 2022-12-01T10:14:50.6436314Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T10:14:50.6487112Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T10:14:50.6533489Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T10:14:50.6578135Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T10:14:50.6626286Z Entering 'third_party/pocketfft' 2022-12-01T10:14:50.6669977Z Entering 'third_party/protobuf' 2022-12-01T10:14:50.6717780Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T10:14:50.6765100Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T10:14:50.6809446Z Entering 'third_party/psimd' 2022-12-01T10:14:50.6852906Z Entering 'third_party/pthreadpool' 2022-12-01T10:14:50.6895553Z Entering 'third_party/pybind11' 2022-12-01T10:14:50.6938609Z Entering 'third_party/python-enum' 2022-12-01T10:14:50.6983149Z Entering 'third_party/python-peachpy' 2022-12-01T10:14:50.7026232Z Entering 'third_party/python-six' 2022-12-01T10:14:50.7069983Z Entering 'third_party/sleef' 2022-12-01T10:14:50.7113679Z Entering 'third_party/tbb' 2022-12-01T10:14:50.7159347Z Entering 'third_party/tensorpipe' 2022-12-01T10:14:50.7203124Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T10:14:50.7246408Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T10:14:50.7288656Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T10:14:50.7331184Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T10:14:50.7373000Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T10:14:50.7419467Z Entering 'third_party/zstd' 2022-12-01T10:14:50.7474882Z ##[endgroup] 2022-12-01T10:14:50.7475429Z ##[group]Persisting credentials for submodules 2022-12-01T10:14:50.7481433Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : 2022-12-01T10:14:50.7808730Z Entering 'android/libs/fbjni' 2022-12-01T10:14:50.7851730Z Entering 'third_party/FP16' 2022-12-01T10:14:50.7894831Z Entering 'third_party/FXdiv' 2022-12-01T10:14:50.7937996Z Entering 'third_party/NNPACK' 2022-12-01T10:14:50.7981960Z Entering 'third_party/QNNPACK' 2022-12-01T10:14:50.8025426Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T10:14:50.8068835Z Entering 'third_party/XNNPACK' 2022-12-01T10:14:50.8124482Z Entering 'third_party/benchmark' 2022-12-01T10:14:50.8169276Z Entering 'third_party/cpuinfo' 2022-12-01T10:14:50.8214500Z Entering 'third_party/cub' 2022-12-01T10:14:50.8259034Z Entering 'third_party/cudnn_frontend' 2022-12-01T10:14:50.8308872Z Entering 'third_party/cutlass' 2022-12-01T10:14:50.8359422Z Entering 'third_party/eigen' 2022-12-01T10:14:50.8405731Z Entering 'third_party/fbgemm' 2022-12-01T10:14:50.8449584Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T10:14:50.8492910Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T10:14:50.8535974Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T10:14:50.8580563Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T10:14:50.8624122Z Entering 'third_party/flatbuffers' 2022-12-01T10:14:50.8669331Z Entering 'third_party/fmt' 2022-12-01T10:14:50.8712311Z Entering 'third_party/foxi' 2022-12-01T10:14:50.8759354Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T10:14:50.8801918Z Entering 'third_party/gloo' 2022-12-01T10:14:50.8844412Z Entering 'third_party/googletest' 2022-12-01T10:14:50.8886950Z Entering 'third_party/ideep' 2022-12-01T10:14:50.8928778Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T10:14:50.8972278Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T10:14:50.9022501Z Entering 'third_party/ios-cmake' 2022-12-01T10:14:50.9066191Z Entering 'third_party/ittapi' 2022-12-01T10:14:50.9112289Z Entering 'third_party/kineto' 2022-12-01T10:14:50.9156427Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T10:14:50.9200592Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T10:14:50.9247390Z Entering 'third_party/nccl/nccl' 2022-12-01T10:14:50.9291115Z Entering 'third_party/neon2sse' 2022-12-01T10:14:50.9338404Z Entering 'third_party/nlohmann' 2022-12-01T10:14:50.9382751Z Entering 'third_party/onnx' 2022-12-01T10:14:50.9438406Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T10:14:50.9481184Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T10:14:50.9528513Z Entering 'third_party/onnx-tensorrt' 2022-12-01T10:14:50.9570051Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T10:14:50.9617329Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T10:14:50.9661962Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T10:14:50.9704821Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T10:14:50.9754108Z Entering 'third_party/pocketfft' 2022-12-01T10:14:50.9798761Z Entering 'third_party/protobuf' 2022-12-01T10:14:50.9846867Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T10:14:50.9888560Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T10:14:50.9933106Z Entering 'third_party/psimd' 2022-12-01T10:14:50.9975531Z Entering 'third_party/pthreadpool' 2022-12-01T10:14:51.0019050Z Entering 'third_party/pybind11' 2022-12-01T10:14:51.0061359Z Entering 'third_party/python-enum' 2022-12-01T10:14:51.0104891Z Entering 'third_party/python-peachpy' 2022-12-01T10:14:51.0148309Z Entering 'third_party/python-six' 2022-12-01T10:14:51.0190207Z Entering 'third_party/sleef' 2022-12-01T10:14:51.0233521Z Entering 'third_party/tbb' 2022-12-01T10:14:51.0277428Z Entering 'third_party/tensorpipe' 2022-12-01T10:14:51.0321239Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T10:14:51.0362948Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T10:14:51.0404741Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T10:14:51.0448383Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T10:14:51.0489182Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T10:14:51.0534625Z Entering 'third_party/zstd' 2022-12-01T10:14:51.0590951Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url 2022-12-01T10:14:51.0972608Z Entering 'android/libs/fbjni' 2022-12-01T10:14:51.1012129Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2022-12-01T10:14:51.1030023Z Entering 'third_party/FP16' 2022-12-01T10:14:51.1070920Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2022-12-01T10:14:51.1089001Z Entering 'third_party/FXdiv' 2022-12-01T10:14:51.1130871Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2022-12-01T10:14:51.1148272Z Entering 'third_party/NNPACK' 2022-12-01T10:14:51.1188408Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2022-12-01T10:14:51.1207021Z Entering 'third_party/QNNPACK' 2022-12-01T10:14:51.1246568Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config remote.origin.url 2022-12-01T10:14:51.1264440Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T10:14:51.1304072Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2022-12-01T10:14:51.1321890Z Entering 'third_party/XNNPACK' 2022-12-01T10:14:51.1363095Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2022-12-01T10:14:51.1392017Z Entering 'third_party/benchmark' 2022-12-01T10:14:51.1431129Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2022-12-01T10:14:51.1449782Z Entering 'third_party/cpuinfo' 2022-12-01T10:14:51.1490361Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2022-12-01T10:14:51.1508693Z Entering 'third_party/cub' 2022-12-01T10:14:51.1549008Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config remote.origin.url 2022-12-01T10:14:51.1567262Z Entering 'third_party/cudnn_frontend' 2022-12-01T10:14:51.1607150Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2022-12-01T10:14:51.1630561Z Entering 'third_party/cutlass' 2022-12-01T10:14:51.1669881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2022-12-01T10:14:51.1695883Z Entering 'third_party/eigen' 2022-12-01T10:14:51.1734902Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config remote.origin.url 2022-12-01T10:14:51.1754189Z Entering 'third_party/fbgemm' 2022-12-01T10:14:51.1794318Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2022-12-01T10:14:51.1812613Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T10:14:51.1853114Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config remote.origin.url 2022-12-01T10:14:51.1870487Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T10:14:51.1909104Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config remote.origin.url 2022-12-01T10:14:51.1927378Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T10:14:51.1966958Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config remote.origin.url 2022-12-01T10:14:51.1984314Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T10:14:51.2023132Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/hipify_torch/config remote.origin.url 2022-12-01T10:14:51.2041450Z Entering 'third_party/flatbuffers' 2022-12-01T10:14:51.2081030Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2022-12-01T10:14:51.2100402Z Entering 'third_party/fmt' 2022-12-01T10:14:51.2139327Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2022-12-01T10:14:51.2157133Z Entering 'third_party/foxi' 2022-12-01T10:14:51.2196743Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config remote.origin.url 2022-12-01T10:14:51.2214412Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T10:14:51.2254366Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2022-12-01T10:14:51.2272118Z Entering 'third_party/gloo' 2022-12-01T10:14:51.2350107Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2022-12-01T10:14:51.2368362Z Entering 'third_party/googletest' 2022-12-01T10:14:51.2407590Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2022-12-01T10:14:51.2425277Z Entering 'third_party/ideep' 2022-12-01T10:14:51.2467798Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2022-12-01T10:14:51.2485354Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T10:14:51.2576014Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2022-12-01T10:14:51.2594679Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T10:14:51.2634711Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config remote.origin.url 2022-12-01T10:14:51.2659444Z Entering 'third_party/ios-cmake' 2022-12-01T10:14:51.2699160Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config remote.origin.url 2022-12-01T10:14:51.2716503Z Entering 'third_party/ittapi' 2022-12-01T10:14:51.2756636Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2022-12-01T10:14:51.2774350Z Entering 'third_party/kineto' 2022-12-01T10:14:51.2813044Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2022-12-01T10:14:51.2830430Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T10:14:51.2870605Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2022-12-01T10:14:51.2888769Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T10:14:51.2927895Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2022-12-01T10:14:51.2946910Z Entering 'third_party/nccl/nccl' 2022-12-01T10:14:51.2986924Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config remote.origin.url 2022-12-01T10:14:51.3004790Z Entering 'third_party/neon2sse' 2022-12-01T10:14:51.3044236Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config remote.origin.url 2022-12-01T10:14:51.3062809Z Entering 'third_party/nlohmann' 2022-12-01T10:14:51.3102256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2022-12-01T10:14:51.3121744Z Entering 'third_party/onnx' 2022-12-01T10:14:51.3160587Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2022-12-01T10:14:51.3191683Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T10:14:51.3230077Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2022-12-01T10:14:51.3259650Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T10:14:51.3297606Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2022-12-01T10:14:51.3317383Z Entering 'third_party/onnx-tensorrt' 2022-12-01T10:14:51.3356553Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config remote.origin.url 2022-12-01T10:14:51.3375061Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T10:14:51.3413851Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config remote.origin.url 2022-12-01T10:14:51.3436484Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T10:14:51.3476580Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2022-12-01T10:14:51.3495906Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T10:14:51.3535468Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2022-12-01T10:14:51.3552664Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T10:14:51.3592221Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2022-12-01T10:14:51.3615256Z Entering 'third_party/pocketfft' 2022-12-01T10:14:51.3654627Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2022-12-01T10:14:51.3671962Z Entering 'third_party/protobuf' 2022-12-01T10:14:51.3713098Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2022-12-01T10:14:51.3734329Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T10:14:51.3772976Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2022-12-01T10:14:51.3789925Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T10:14:51.3829392Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2022-12-01T10:14:51.3849252Z Entering 'third_party/psimd' 2022-12-01T10:14:51.3888390Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2022-12-01T10:14:51.3905868Z Entering 'third_party/pthreadpool' 2022-12-01T10:14:51.3945326Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2022-12-01T10:14:51.3962858Z Entering 'third_party/pybind11' 2022-12-01T10:14:51.4001440Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2022-12-01T10:14:51.4019746Z Entering 'third_party/python-enum' 2022-12-01T10:14:51.4059230Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config remote.origin.url 2022-12-01T10:14:51.4076694Z Entering 'third_party/python-peachpy' 2022-12-01T10:14:51.4116294Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2022-12-01T10:14:51.4134153Z Entering 'third_party/python-six' 2022-12-01T10:14:51.4173410Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config remote.origin.url 2022-12-01T10:14:51.4190806Z Entering 'third_party/sleef' 2022-12-01T10:14:51.4230404Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2022-12-01T10:14:51.4250930Z Entering 'third_party/tbb' 2022-12-01T10:14:51.4289680Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config remote.origin.url 2022-12-01T10:14:51.4309178Z Entering 'third_party/tensorpipe' 2022-12-01T10:14:51.4349059Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2022-12-01T10:14:51.4367771Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T10:14:51.4407075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2022-12-01T10:14:51.4424437Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T10:14:51.4463637Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2022-12-01T10:14:51.4480642Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T10:14:51.4519255Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2022-12-01T10:14:51.4537541Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T10:14:51.4577192Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2022-12-01T10:14:51.4593975Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T10:14:51.4633073Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2022-12-01T10:14:51.4653717Z Entering 'third_party/zstd' 2022-12-01T10:14:51.4692057Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config remote.origin.url 2022-12-01T10:14:51.5665453Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2022-12-01T10:14:51.6006225Z Entering 'android/libs/fbjni' 2022-12-01T10:14:51.6051248Z Entering 'third_party/FP16' 2022-12-01T10:14:51.6095018Z Entering 'third_party/FXdiv' 2022-12-01T10:14:51.6138327Z Entering 'third_party/NNPACK' 2022-12-01T10:14:51.6183103Z Entering 'third_party/QNNPACK' 2022-12-01T10:14:51.6236616Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T10:14:51.6281372Z Entering 'third_party/XNNPACK' 2022-12-01T10:14:51.6336666Z Entering 'third_party/benchmark' 2022-12-01T10:14:51.6381750Z Entering 'third_party/cpuinfo' 2022-12-01T10:14:51.6441789Z Entering 'third_party/cub' 2022-12-01T10:14:51.6486249Z Entering 'third_party/cudnn_frontend' 2022-12-01T10:14:51.6554815Z Entering 'third_party/cutlass' 2022-12-01T10:14:51.6607562Z Entering 'third_party/eigen' 2022-12-01T10:14:51.6654756Z Entering 'third_party/fbgemm' 2022-12-01T10:14:51.6699335Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T10:14:51.6902438Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T10:14:51.6946493Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T10:14:51.6991410Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T10:14:51.7035962Z Entering 'third_party/flatbuffers' 2022-12-01T10:14:51.7082419Z Entering 'third_party/fmt' 2022-12-01T10:14:51.7127455Z Entering 'third_party/foxi' 2022-12-01T10:14:51.7172017Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T10:14:51.7217363Z Entering 'third_party/gloo' 2022-12-01T10:14:51.7263643Z Entering 'third_party/googletest' 2022-12-01T10:14:51.7308365Z Entering 'third_party/ideep' 2022-12-01T10:14:51.7354369Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T10:14:51.7401409Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T10:14:51.7452319Z Entering 'third_party/ios-cmake' 2022-12-01T10:14:51.7496772Z Entering 'third_party/ittapi' 2022-12-01T10:14:51.7540083Z Entering 'third_party/kineto' 2022-12-01T10:14:51.7585541Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T10:14:51.7629719Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T10:14:51.7674759Z Entering 'third_party/nccl/nccl' 2022-12-01T10:14:51.7721808Z Entering 'third_party/neon2sse' 2022-12-01T10:14:51.7765011Z Entering 'third_party/nlohmann' 2022-12-01T10:14:51.7810105Z Entering 'third_party/onnx' 2022-12-01T10:14:51.8019566Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T10:14:51.8064269Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T10:14:51.8111383Z Entering 'third_party/onnx-tensorrt' 2022-12-01T10:14:51.8154472Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T10:14:51.8201544Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T10:14:51.8245271Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T10:14:51.8289081Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T10:14:51.8336698Z Entering 'third_party/pocketfft' 2022-12-01T10:14:51.8380338Z Entering 'third_party/protobuf' 2022-12-01T10:14:51.8427160Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T10:14:51.8469843Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T10:14:51.8516072Z Entering 'third_party/psimd' 2022-12-01T10:14:51.8560664Z Entering 'third_party/pthreadpool' 2022-12-01T10:14:51.8605085Z Entering 'third_party/pybind11' 2022-12-01T10:14:51.8663161Z Entering 'third_party/python-enum' 2022-12-01T10:14:51.8705923Z Entering 'third_party/python-peachpy' 2022-12-01T10:14:51.8754850Z Entering 'third_party/python-six' 2022-12-01T10:14:51.8797633Z Entering 'third_party/sleef' 2022-12-01T10:14:51.8840836Z Entering 'third_party/tbb' 2022-12-01T10:14:51.8886231Z Entering 'third_party/tensorpipe' 2022-12-01T10:14:51.8930360Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T10:14:51.8973022Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T10:14:51.9028568Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T10:14:51.9071542Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T10:14:51.9112882Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T10:14:51.9162247Z Entering 'third_party/zstd' 2022-12-01T10:14:51.9220318Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2022-12-01T10:14:51.9568388Z Entering 'android/libs/fbjni' 2022-12-01T10:14:51.9610557Z Entering 'third_party/FP16' 2022-12-01T10:14:51.9653770Z Entering 'third_party/FXdiv' 2022-12-01T10:14:51.9696959Z Entering 'third_party/NNPACK' 2022-12-01T10:14:51.9740250Z Entering 'third_party/QNNPACK' 2022-12-01T10:14:51.9784416Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T10:14:51.9827412Z Entering 'third_party/XNNPACK' 2022-12-01T10:14:51.9881452Z Entering 'third_party/benchmark' 2022-12-01T10:14:51.9924272Z Entering 'third_party/cpuinfo' 2022-12-01T10:14:51.9968079Z Entering 'third_party/cub' 2022-12-01T10:14:52.0011590Z Entering 'third_party/cudnn_frontend' 2022-12-01T10:14:52.0060214Z Entering 'third_party/cutlass' 2022-12-01T10:14:52.0109653Z Entering 'third_party/eigen' 2022-12-01T10:14:52.0155019Z Entering 'third_party/fbgemm' 2022-12-01T10:14:52.0201196Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T10:14:52.0243494Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T10:14:52.0286330Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T10:14:52.0329220Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T10:14:52.0372820Z Entering 'third_party/flatbuffers' 2022-12-01T10:14:52.0418268Z Entering 'third_party/fmt' 2022-12-01T10:14:52.0461682Z Entering 'third_party/foxi' 2022-12-01T10:14:52.0505612Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T10:14:52.0549208Z Entering 'third_party/gloo' 2022-12-01T10:14:52.0593755Z Entering 'third_party/googletest' 2022-12-01T10:14:52.0637593Z Entering 'third_party/ideep' 2022-12-01T10:14:52.0680321Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T10:14:52.0725658Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T10:14:52.0776134Z Entering 'third_party/ios-cmake' 2022-12-01T10:14:52.0818689Z Entering 'third_party/ittapi' 2022-12-01T10:14:52.0862806Z Entering 'third_party/kineto' 2022-12-01T10:14:52.0905390Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T10:14:52.0948034Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T10:14:52.0993822Z Entering 'third_party/nccl/nccl' 2022-12-01T10:14:52.1037952Z Entering 'third_party/neon2sse' 2022-12-01T10:14:52.1080998Z Entering 'third_party/nlohmann' 2022-12-01T10:14:52.1126166Z Entering 'third_party/onnx' 2022-12-01T10:14:52.1184070Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T10:14:52.1229974Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T10:14:52.1277166Z Entering 'third_party/onnx-tensorrt' 2022-12-01T10:14:52.1319342Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T10:14:52.1367740Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T10:14:52.1412069Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T10:14:52.1456791Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T10:14:52.1505774Z Entering 'third_party/pocketfft' 2022-12-01T10:14:52.1549692Z Entering 'third_party/protobuf' 2022-12-01T10:14:52.1598318Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T10:14:52.1643927Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T10:14:52.1692201Z Entering 'third_party/psimd' 2022-12-01T10:14:52.1737143Z Entering 'third_party/pthreadpool' 2022-12-01T10:14:52.1782289Z Entering 'third_party/pybind11' 2022-12-01T10:14:52.1829995Z Entering 'third_party/python-enum' 2022-12-01T10:14:52.1875192Z Entering 'third_party/python-peachpy' 2022-12-01T10:14:52.1920419Z Entering 'third_party/python-six' 2022-12-01T10:14:52.1965890Z Entering 'third_party/sleef' 2022-12-01T10:14:52.2011389Z Entering 'third_party/tbb' 2022-12-01T10:14:52.2058738Z Entering 'third_party/tensorpipe' 2022-12-01T10:14:52.2103376Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T10:14:52.2148556Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T10:14:52.2193282Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T10:14:52.2238642Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T10:14:52.2283879Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T10:14:52.2330823Z Entering 'third_party/zstd' 2022-12-01T10:14:52.2384868Z ##[endgroup] 2022-12-01T10:14:52.2432805Z [command]/usr/bin/git log -1 --format='%H' 2022-12-01T10:14:52.2464918Z 'c13d400bffe90e16b96520bbc8a41a6f0c9cd584' 2022-12-01T10:14:52.2635942Z Prepare all required actions 2022-12-01T10:14:52.2708023Z ##[group]Run ./.github/actions/setup-linux 2022-12-01T10:14:52.2708283Z env: 2022-12-01T10:14:52.2708520Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:52.2708758Z ##[endgroup] 2022-12-01T10:14:52.2730276Z ##[group]Run set -euo pipefail 2022-12-01T10:14:52.2730591Z set -euo pipefail 2022-12-01T10:14:52.2730878Z function get_ec2_metadata() { 2022-12-01T10:14:52.2731210Z  # Pulled from instance metadata endpoint for EC2 2022-12-01T10:14:52.2731685Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2022-12-01T10:14:52.2732071Z  category=$1 2022-12-01T10:14:52.2732393Z  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2022-12-01T10:14:52.2732857Z } 2022-12-01T10:14:52.2733117Z echo "ami-id: $(get_ec2_metadata ami-id)" 2022-12-01T10:14:52.2733499Z echo "instance-id: $(get_ec2_metadata instance-id)" 2022-12-01T10:14:52.2733875Z echo "instance-type: $(get_ec2_metadata instance-type)" 2022-12-01T10:14:52.2734195Z echo "system info $(uname -a)" 2022-12-01T10:14:52.2747664Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:14:52.2747958Z env: 2022-12-01T10:14:52.2748180Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:52.2748433Z ##[endgroup] 2022-12-01T10:14:52.2849743Z ami-id: ami-096198a0bccc6bad4 2022-12-01T10:14:52.2913299Z instance-id: i-0eaaa5984457e9076 2022-12-01T10:14:52.2975707Z instance-type: g3.8xlarge 2022-12-01T10:14:52.2984386Z system info Linux ip-10-0-0-161.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux 2022-12-01T10:14:52.3006580Z ##[group]Run if systemctl is-active --quiet docker; then 2022-12-01T10:14:52.3006945Z if systemctl is-active --quiet docker; then 2022-12-01T10:14:52.3007267Z  echo "Docker daemon is running..."; 2022-12-01T10:14:52.3007521Z else 2022-12-01T10:14:52.3007821Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2022-12-01T10:14:52.3008114Z fi 2022-12-01T10:14:52.3020277Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:14:52.3020551Z env: 2022-12-01T10:14:52.3020785Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:52.3021033Z ##[endgroup] 2022-12-01T10:14:52.3071834Z Docker daemon is running... 2022-12-01T10:14:52.3093893Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2022-12-01T10:14:52.3094356Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2022-12-01T10:14:52.3094736Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-12-01T10:14:52.3095233Z retry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2022-12-01T10:14:52.3095698Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2022-12-01T10:14:52.3107673Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:14:52.3107966Z env: 2022-12-01T10:14:52.3108202Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:52.3108452Z AWS_RETRY_MODE: standard 2022-12-01T10:14:52.3108694Z AWS_MAX_ATTEMPTS: 5 2022-12-01T10:14:52.3108960Z AWS_DEFAULT_REGION: us-east-1 2022-12-01T10:14:52.3109202Z ##[endgroup] 2022-12-01T10:14:53.2655277Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2022-12-01T10:14:53.2655767Z Configure a credential helper to remove this warning. See 2022-12-01T10:14:53.2656315Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2022-12-01T10:14:53.2656581Z 2022-12-01T10:14:53.2656700Z Login Succeeded 2022-12-01T10:14:53.2695751Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2022-12-01T10:14:53.2696157Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2022-12-01T10:14:53.2696640Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2022-12-01T10:14:53.2709823Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:14:53.2710123Z env: 2022-12-01T10:14:53.2710362Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:53.2710604Z ##[endgroup] 2022-12-01T10:14:53.2810981Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2022-12-01T10:14:53.2811344Z with: 2022-12-01T10:14:53.2811800Z github-secret: *** 2022-12-01T10:14:53.2812099Z activate-with-label: false 2022-12-01T10:14:53.2812363Z label: with-ssh 2022-12-01T10:14:53.2812637Z remove-existing-keys: true 2022-12-01T10:14:53.2812903Z env: 2022-12-01T10:14:53.2813136Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:53.2813400Z ##[endgroup] 2022-12-01T10:14:53.8628489Z Grabbing public ssh keys from https://github.com/charlie-wt.keys 2022-12-01T10:14:53.9471474Z ~/.ssh/authorized_keys file found on node, removing ~/.ssh and starting fresh 2022-12-01T10:14:53.9493764Z Public keys pulled and installed to /home/ec2-user/.ssh/authorized_keys 2022-12-01T10:14:53.9537505Z Login using: ssh ec2-user@ec2-54-211-197-104.compute-1.amazonaws.com 2022-12-01T10:14:53.9615217Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2022-12-01T10:14:53.9615578Z with: 2022-12-01T10:14:53.9616049Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:14:53.9616510Z env: 2022-12-01T10:14:53.9616752Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:53.9616992Z ##[endgroup] 2022-12-01T10:14:53.9634793Z ##[group]Run retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-12-01T10:14:53.9635377Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-12-01T10:14:53.9635757Z # ignore output since only exit code is used for conditional 2022-12-01T10:14:53.9636140Z # only pull docker image if it's not available locally 2022-12-01T10:14:53.9636526Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2022-12-01T10:14:53.9636940Z  retry docker pull "${DOCKER_IMAGE}" 2022-12-01T10:14:53.9637218Z fi 2022-12-01T10:14:53.9650817Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:14:53.9651118Z env: 2022-12-01T10:14:53.9651358Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:14:53.9651857Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:14:53.9652336Z ##[endgroup] 2022-12-01T10:14:54.2109490Z fa72f5a0a230eb632055220542038bd4ceca184b: Pulling from pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 2022-12-01T10:14:54.2109992Z 40dd5be53814: Pulling fs layer 2022-12-01T10:14:54.2110523Z bd44602516a4: Pulling fs layer 2022-12-01T10:14:54.2111008Z 8ebfb31ea67d: Pulling fs layer 2022-12-01T10:14:54.2111326Z 1589dc294916: Pulling fs layer 2022-12-01T10:14:54.2111829Z 2c3a764ff1ef: Pulling fs layer 2022-12-01T10:14:54.2112332Z 2fb24fb5f7cb: Pulling fs layer 2022-12-01T10:14:54.2112633Z d6e4b45751c9: Pulling fs layer 2022-12-01T10:14:54.2112906Z 98a26bc0781e: Pulling fs layer 2022-12-01T10:14:54.2113228Z 1589dc294916: Waiting 2022-12-01T10:14:54.2115050Z 07c42b0591b2: Pulling fs layer 2022-12-01T10:14:54.2115510Z 2fb24fb5f7cb: Waiting 2022-12-01T10:14:54.2115972Z a7b4d9098b01: Pulling fs layer 2022-12-01T10:14:54.2116252Z 9b965084d7f3: Pulling fs layer 2022-12-01T10:14:54.2116496Z 2c3a764ff1ef: Waiting 2022-12-01T10:14:54.2116782Z 306938abc720: Pulling fs layer 2022-12-01T10:14:54.2123680Z d6e4b45751c9: Waiting 2022-12-01T10:14:54.2124305Z a7b4d9098b01: Waiting 2022-12-01T10:14:54.2124791Z 06e616940749: Pulling fs layer 2022-12-01T10:14:54.2125365Z 9b965084d7f3: Waiting 2022-12-01T10:14:54.2125887Z 540637333efa: Pulling fs layer 2022-12-01T10:14:54.2126338Z 06e616940749: Waiting 2022-12-01T10:14:54.2126676Z 540637333efa: Waiting 2022-12-01T10:14:54.2126923Z 5cd3ebaf3e1b: Pulling fs layer 2022-12-01T10:14:54.2127207Z dd7a31bfac90: Pulling fs layer 2022-12-01T10:14:54.2127653Z 6416ca3f405b: Pulling fs layer 2022-12-01T10:14:54.2127931Z 19c44dd1f4cf: Pulling fs layer 2022-12-01T10:14:54.2128199Z 33688bb73038: Pulling fs layer 2022-12-01T10:14:54.2128469Z 570b8809d75f: Pulling fs layer 2022-12-01T10:14:54.2128713Z 6416ca3f405b: Waiting 2022-12-01T10:14:54.2128958Z dd7a31bfac90: Waiting 2022-12-01T10:14:54.2129197Z 33688bb73038: Waiting 2022-12-01T10:14:54.2129435Z 3ea8b8f31abf: Pulling fs layer 2022-12-01T10:14:54.2129700Z 5cd3ebaf3e1b: Waiting 2022-12-01T10:14:54.2129964Z 690eecb8c64c: Pulling fs layer 2022-12-01T10:14:54.2130394Z e2285545a8bf: Pulling fs layer 2022-12-01T10:14:54.2130987Z 555faed4e286: Pulling fs layer 2022-12-01T10:14:54.2131755Z 3ea8b8f31abf: Waiting 2022-12-01T10:14:54.2132017Z c53c37f056ac: Pulling fs layer 2022-12-01T10:14:54.2132311Z 75bd8ea35c95: Pulling fs layer 2022-12-01T10:14:54.2132842Z b18c1d97e9da: Pulling fs layer 2022-12-01T10:14:54.2133404Z d90720873b3b: Pulling fs layer 2022-12-01T10:14:54.2133821Z 690eecb8c64c: Waiting 2022-12-01T10:14:54.2134346Z b18c1d97e9da: Waiting 2022-12-01T10:14:54.2135055Z 53b1597ec1c2: Pulling fs layer 2022-12-01T10:14:54.2135378Z ef984347a312: Pulling fs layer 2022-12-01T10:14:54.2135630Z d90720873b3b: Waiting 2022-12-01T10:14:54.2135890Z 343a1b6dc08c: Pulling fs layer 2022-12-01T10:14:54.2136129Z 53b1597ec1c2: Waiting 2022-12-01T10:14:54.2136370Z ef984347a312: Waiting 2022-12-01T10:14:54.2136609Z 75bd8ea35c95: Waiting 2022-12-01T10:14:54.2136848Z 16fdefc9f32c: Pulling fs layer 2022-12-01T10:14:54.2137119Z de0913f5235c: Pulling fs layer 2022-12-01T10:14:54.2137430Z bf99e81d1939: Pulling fs layer 2022-12-01T10:14:54.2137704Z c0f36b953c44: Pulling fs layer 2022-12-01T10:14:54.2137958Z 7456bfaa70b2: Pulling fs layer 2022-12-01T10:14:54.2138220Z c53c37f056ac: Waiting 2022-12-01T10:14:54.2138461Z e2285545a8bf: Waiting 2022-12-01T10:14:54.2138685Z 343a1b6dc08c: Waiting 2022-12-01T10:14:54.2138930Z 16fdefc9f32c: Waiting 2022-12-01T10:14:54.2139172Z c0f36b953c44: Waiting 2022-12-01T10:14:54.2139393Z de0913f5235c: Waiting 2022-12-01T10:14:54.2139657Z 68e8ef95b145: Pulling fs layer 2022-12-01T10:14:54.2139929Z e497d45bb922: Pulling fs layer 2022-12-01T10:14:54.2140167Z 7456bfaa70b2: Waiting 2022-12-01T10:14:54.2140591Z bf4a66dadd84: Pulling fs layer 2022-12-01T10:14:54.2140876Z fd0d4928961f: Pulling fs layer 2022-12-01T10:14:54.2141244Z 68e8ef95b145: Waiting 2022-12-01T10:14:54.2141516Z 6df8346812f8: Pulling fs layer 2022-12-01T10:14:54.2141946Z 8b46f8aa8681: Pulling fs layer 2022-12-01T10:14:54.2142237Z fd0d4928961f: Waiting 2022-12-01T10:14:54.2142631Z 6db5688b5394: Pulling fs layer 2022-12-01T10:14:54.2142898Z bf4a66dadd84: Waiting 2022-12-01T10:14:54.2143142Z daa053ac35f3: Pulling fs layer 2022-12-01T10:14:54.2143421Z c9a8257f8dc0: Pulling fs layer 2022-12-01T10:14:54.2143687Z 30f21817172d: Pulling fs layer 2022-12-01T10:14:54.2143921Z 6df8346812f8: Waiting 2022-12-01T10:14:54.2144176Z bb9282afce05: Pulling fs layer 2022-12-01T10:14:54.2144443Z 909c475d21ec: Pulling fs layer 2022-12-01T10:14:54.2144677Z 6db5688b5394: Waiting 2022-12-01T10:14:54.2144933Z 192b8944f15a: Pulling fs layer 2022-12-01T10:14:54.2145202Z 3dd92c832839: Pulling fs layer 2022-12-01T10:14:54.2145454Z 5e1b0146a21d: Pulling fs layer 2022-12-01T10:14:54.2145705Z 909c475d21ec: Waiting 2022-12-01T10:14:54.2145945Z bb9282afce05: Waiting 2022-12-01T10:14:54.2146181Z c76cf0339419: Pulling fs layer 2022-12-01T10:14:54.2146451Z 2dc4928eecf0: Pulling fs layer 2022-12-01T10:14:54.2146711Z 192b8944f15a: Waiting 2022-12-01T10:14:54.2146929Z 3dd92c832839: Waiting 2022-12-01T10:14:54.2147164Z c76cf0339419: Waiting 2022-12-01T10:14:54.2147397Z 5e1b0146a21d: Waiting 2022-12-01T10:14:54.4044583Z bd44602516a4: Download complete 2022-12-01T10:14:54.4897518Z 1589dc294916: Verifying Checksum 2022-12-01T10:14:54.4898072Z 1589dc294916: Download complete 2022-12-01T10:14:54.5707024Z 2c3a764ff1ef: Download complete 2022-12-01T10:14:54.6080279Z 40dd5be53814: Verifying Checksum 2022-12-01T10:14:54.6080851Z 40dd5be53814: Download complete 2022-12-01T10:14:54.6490071Z 8ebfb31ea67d: Verifying Checksum 2022-12-01T10:14:54.6490394Z 8ebfb31ea67d: Download complete 2022-12-01T10:14:54.7141336Z d6e4b45751c9: Verifying Checksum 2022-12-01T10:14:54.7141667Z d6e4b45751c9: Download complete 2022-12-01T10:14:54.8025939Z 07c42b0591b2: Download complete 2022-12-01T10:14:54.8830839Z a7b4d9098b01: Download complete 2022-12-01T10:14:55.4984405Z 40dd5be53814: Pull complete 2022-12-01T10:14:55.7958012Z bd44602516a4: Pull complete 2022-12-01T10:14:56.3377119Z 8ebfb31ea67d: Pull complete 2022-12-01T10:14:56.4689582Z 1589dc294916: Pull complete 2022-12-01T10:14:56.6047002Z 2c3a764ff1ef: Pull complete 2022-12-01T10:14:57.8112279Z 9b965084d7f3: Verifying Checksum 2022-12-01T10:14:57.8112871Z 9b965084d7f3: Download complete 2022-12-01T10:14:57.8852149Z 306938abc720: Verifying Checksum 2022-12-01T10:14:57.8852450Z 306938abc720: Download complete 2022-12-01T10:14:57.9646954Z 06e616940749: Download complete 2022-12-01T10:14:58.0450418Z 540637333efa: Download complete 2022-12-01T10:15:00.0941613Z 5cd3ebaf3e1b: Verifying Checksum 2022-12-01T10:15:00.0942599Z 5cd3ebaf3e1b: Download complete 2022-12-01T10:15:00.2095145Z dd7a31bfac90: Verifying Checksum 2022-12-01T10:15:00.2095714Z dd7a31bfac90: Download complete 2022-12-01T10:15:00.3135497Z 6416ca3f405b: Verifying Checksum 2022-12-01T10:15:00.3135831Z 6416ca3f405b: Download complete 2022-12-01T10:15:24.2411842Z 2fb24fb5f7cb: Verifying Checksum 2022-12-01T10:15:24.2412222Z 2fb24fb5f7cb: Download complete 2022-12-01T10:15:24.3393846Z 33688bb73038: Verifying Checksum 2022-12-01T10:15:24.4298370Z 570b8809d75f: Verifying Checksum 2022-12-01T10:15:24.4298981Z 570b8809d75f: Download complete 2022-12-01T10:15:24.5507877Z 3ea8b8f31abf: Download complete 2022-12-01T10:15:24.6514583Z 690eecb8c64c: Download complete 2022-12-01T10:15:24.7169473Z e2285545a8bf: Download complete 2022-12-01T10:15:24.8364456Z 555faed4e286: Verifying Checksum 2022-12-01T10:15:24.8365149Z 555faed4e286: Download complete 2022-12-01T10:15:27.4218621Z c53c37f056ac: Verifying Checksum 2022-12-01T10:15:27.4218984Z c53c37f056ac: Download complete 2022-12-01T10:15:27.5081384Z 75bd8ea35c95: Verifying Checksum 2022-12-01T10:15:27.5081937Z 75bd8ea35c95: Download complete 2022-12-01T10:15:27.6273621Z b18c1d97e9da: Verifying Checksum 2022-12-01T10:15:27.6273955Z b18c1d97e9da: Download complete 2022-12-01T10:15:27.7652399Z d90720873b3b: Download complete 2022-12-01T10:15:27.8443191Z 53b1597ec1c2: Verifying Checksum 2022-12-01T10:15:27.8443497Z 53b1597ec1c2: Download complete 2022-12-01T10:15:27.9276999Z ef984347a312: Verifying Checksum 2022-12-01T10:15:27.9277331Z ef984347a312: Download complete 2022-12-01T10:15:33.8127024Z 98a26bc0781e: Verifying Checksum 2022-12-01T10:15:33.8127428Z 98a26bc0781e: Download complete 2022-12-01T10:15:33.8766064Z 343a1b6dc08c: Verifying Checksum 2022-12-01T10:15:33.8766379Z 343a1b6dc08c: Download complete 2022-12-01T10:15:33.8958364Z 16fdefc9f32c: Verifying Checksum 2022-12-01T10:15:33.8958832Z 16fdefc9f32c: Download complete 2022-12-01T10:15:33.9657113Z de0913f5235c: Verifying Checksum 2022-12-01T10:15:33.9657478Z de0913f5235c: Download complete 2022-12-01T10:15:34.0316444Z c0f36b953c44: Verifying Checksum 2022-12-01T10:15:34.0316756Z c0f36b953c44: Download complete 2022-12-01T10:15:34.1111129Z 7456bfaa70b2: Verifying Checksum 2022-12-01T10:15:34.1111718Z 7456bfaa70b2: Download complete 2022-12-01T10:15:34.6878107Z 68e8ef95b145: Verifying Checksum 2022-12-01T10:15:34.6878727Z 68e8ef95b145: Download complete 2022-12-01T10:15:34.7404711Z bf99e81d1939: Verifying Checksum 2022-12-01T10:15:34.7405042Z bf99e81d1939: Download complete 2022-12-01T10:15:34.7618163Z e497d45bb922: Verifying Checksum 2022-12-01T10:15:34.7618457Z e497d45bb922: Download complete 2022-12-01T10:15:34.8434774Z fd0d4928961f: Verifying Checksum 2022-12-01T10:15:34.8435119Z fd0d4928961f: Download complete 2022-12-01T10:15:34.9397505Z 6df8346812f8: Verifying Checksum 2022-12-01T10:15:34.9398081Z 6df8346812f8: Download complete 2022-12-01T10:15:35.8917776Z bf4a66dadd84: Verifying Checksum 2022-12-01T10:15:35.8918794Z bf4a66dadd84: Download complete 2022-12-01T10:15:36.0361577Z 6db5688b5394: Verifying Checksum 2022-12-01T10:15:36.0361916Z 6db5688b5394: Download complete 2022-12-01T10:15:36.1196065Z daa053ac35f3: Download complete 2022-12-01T10:15:36.1962794Z c9a8257f8dc0: Verifying Checksum 2022-12-01T10:15:36.2983664Z c9a8257f8dc0: Download complete 2022-12-01T10:15:36.2984000Z 30f21817172d: Verifying Checksum 2022-12-01T10:15:36.2984277Z 30f21817172d: Download complete 2022-12-01T10:15:36.8119199Z bb9282afce05: Verifying Checksum 2022-12-01T10:15:36.8119615Z bb9282afce05: Download complete 2022-12-01T10:15:36.8888835Z 909c475d21ec: Verifying Checksum 2022-12-01T10:15:36.8889145Z 909c475d21ec: Download complete 2022-12-01T10:15:38.3624424Z 192b8944f15a: Verifying Checksum 2022-12-01T10:15:38.3625031Z 192b8944f15a: Download complete 2022-12-01T10:15:38.4431510Z 3dd92c832839: Verifying Checksum 2022-12-01T10:15:38.4431858Z 3dd92c832839: Download complete 2022-12-01T10:15:40.0237534Z 2fb24fb5f7cb: Pull complete 2022-12-01T10:15:42.1818298Z d6e4b45751c9: Pull complete 2022-12-01T10:15:45.1168441Z 8b46f8aa8681: Download complete 2022-12-01T10:15:45.1802864Z c76cf0339419: Verifying Checksum 2022-12-01T10:15:45.1803185Z c76cf0339419: Download complete 2022-12-01T10:15:45.2796174Z 2dc4928eecf0: Verifying Checksum 2022-12-01T10:15:45.2796621Z 2dc4928eecf0: Download complete 2022-12-01T10:15:57.0385875Z 19c44dd1f4cf: Verifying Checksum 2022-12-01T10:15:57.0386276Z 19c44dd1f4cf: Download complete 2022-12-01T10:16:04.8230667Z 98a26bc0781e: Pull complete 2022-12-01T10:16:06.4509035Z 07c42b0591b2: Pull complete 2022-12-01T10:16:06.5718570Z a7b4d9098b01: Pull complete 2022-12-01T10:16:14.1755125Z 9b965084d7f3: Pull complete 2022-12-01T10:16:16.3364539Z 306938abc720: Pull complete 2022-12-01T10:16:18.3356354Z 06e616940749: Pull complete 2022-12-01T10:16:20.2130535Z 540637333efa: Pull complete 2022-12-01T10:16:25.0004962Z 5cd3ebaf3e1b: Pull complete 2022-12-01T10:16:27.2506224Z dd7a31bfac90: Pull complete 2022-12-01T10:16:29.1301728Z 6416ca3f405b: Pull complete 2022-12-01T10:17:04.3153423Z 19c44dd1f4cf: Pull complete 2022-12-01T10:17:06.3243142Z 33688bb73038: Pull complete 2022-12-01T10:17:06.9933992Z 570b8809d75f: Pull complete 2022-12-01T10:17:07.0532059Z 5e1b0146a21d: Verifying Checksum 2022-12-01T10:17:07.0532422Z 5e1b0146a21d: Download complete 2022-12-01T10:17:07.1072733Z 3ea8b8f31abf: Pull complete 2022-12-01T10:17:07.2120850Z 690eecb8c64c: Pull complete 2022-12-01T10:17:07.3070175Z e2285545a8bf: Pull complete 2022-12-01T10:17:07.4320969Z 555faed4e286: Pull complete 2022-12-01T10:17:09.7492819Z c53c37f056ac: Pull complete 2022-12-01T10:17:09.8548403Z 75bd8ea35c95: Pull complete 2022-12-01T10:17:09.9626718Z b18c1d97e9da: Pull complete 2022-12-01T10:17:10.1108165Z d90720873b3b: Pull complete 2022-12-01T10:17:10.2136716Z 53b1597ec1c2: Pull complete 2022-12-01T10:17:10.3234128Z ef984347a312: Pull complete 2022-12-01T10:17:18.3533630Z 343a1b6dc08c: Pull complete 2022-12-01T10:17:20.2604460Z 16fdefc9f32c: Pull complete 2022-12-01T10:17:22.1699247Z de0913f5235c: Pull complete 2022-12-01T10:17:24.8594207Z bf99e81d1939: Pull complete 2022-12-01T10:17:27.4194742Z c0f36b953c44: Pull complete 2022-12-01T10:17:30.5495748Z 7456bfaa70b2: Pull complete 2022-12-01T10:17:33.8652263Z 68e8ef95b145: Pull complete 2022-12-01T10:17:36.1403727Z e497d45bb922: Pull complete 2022-12-01T10:17:40.9029516Z bf4a66dadd84: Pull complete 2022-12-01T10:17:43.2238327Z fd0d4928961f: Pull complete 2022-12-01T10:17:45.0548550Z 6df8346812f8: Pull complete 2022-12-01T10:17:52.4329196Z 8b46f8aa8681: Pull complete 2022-12-01T10:17:54.4369035Z 6db5688b5394: Pull complete 2022-12-01T10:17:56.3678137Z daa053ac35f3: Pull complete 2022-12-01T10:17:58.2777629Z c9a8257f8dc0: Pull complete 2022-12-01T10:17:59.6494678Z 30f21817172d: Pull complete 2022-12-01T10:18:00.5870895Z bb9282afce05: Pull complete 2022-12-01T10:18:00.8942485Z 909c475d21ec: Pull complete 2022-12-01T10:18:02.9066448Z 192b8944f15a: Pull complete 2022-12-01T10:18:03.0328049Z 3dd92c832839: Pull complete 2022-12-01T10:18:45.3897020Z 5e1b0146a21d: Pull complete 2022-12-01T10:18:47.2383438Z c76cf0339419: Pull complete 2022-12-01T10:18:49.1168487Z 2dc4928eecf0: Pull complete 2022-12-01T10:18:50.3853417Z Digest: sha256:217fd7de680e1dd5bca4b2b4054bd05a8d454df5e210ffbf1e5955e01cf1f340 2022-12-01T10:18:50.9030313Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:18:51.1849206Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:18:51.1944715Z ##[group]Run nick-fields/retry@7d4a37704547a311dbb66ebdf5b23ec19374a767 2022-12-01T10:18:51.1945188Z with: 2022-12-01T10:18:51.1945402Z timeout_minutes: 10 2022-12-01T10:18:51.1945646Z max_attempts: 3 2022-12-01T10:18:51.1946031Z command: set -ex bash .github/scripts/install_nvidia_utils_linux.sh echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" 2022-12-01T10:18:51.1946411Z retry_wait_seconds: 10 2022-12-01T10:18:51.1946658Z polling_interval_seconds: 1 2022-12-01T10:18:51.1946922Z warning_on_retry: true 2022-12-01T10:18:51.1947178Z continue_on_error: false 2022-12-01T10:18:51.1947399Z env: 2022-12-01T10:18:51.1947629Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:18:51.1947880Z ##[endgroup] 2022-12-01T10:18:51.2511214Z 2022-12-01T10:18:51.2533760Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T10:18:51.2576545Z + bash .github/scripts/install_nvidia_utils_linux.sh 2022-12-01T10:18:51.2587432Z == Installing nvidia driver NVIDIA-Linux-x86_64-515.57.run == 2022-12-01T10:18:51.2590422Z + HAS_NVIDIA_DRIVER=0 2022-12-01T10:18:51.2594193Z ++ command -v nvidia-smi 2022-12-01T10:18:51.2596704Z + '[' -x /usr/bin/nvidia-smi ']' 2022-12-01T10:18:51.2600299Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader 2022-12-01T10:18:54.1660895Z + INSTALLED_DRIVER_VERSION='515.76 2022-12-01T10:18:54.1661414Z 515.76' 2022-12-01T10:18:54.1661690Z + '[' '515.76 2022-12-01T10:18:54.1661950Z 515.76' '!=' 515.57 ']' 2022-12-01T10:18:54.1664674Z + HAS_NVIDIA_DRIVER=1 2022-12-01T10:18:54.1665084Z + echo 'NVIDIA driver (515.76 2022-12-01T10:18:54.1665652Z 515.76) has been installed, but we expect to have 515.57 instead. Skipping NVIDIA driver installation for now until torchrec and FBGEMM are updated to use PyTorch NVIDIA installation script instead of RHEL repo' 2022-12-01T10:18:54.1666205Z + '[' 1 -eq 0 ']' 2022-12-01T10:18:54.1666462Z + nvidia-smi 2022-12-01T10:18:54.1666722Z NVIDIA driver (515.76 2022-12-01T10:18:54.1667241Z 515.76) has been installed, but we expect to have 515.57 instead. Skipping NVIDIA driver installation for now until torchrec and FBGEMM are updated to use PyTorch NVIDIA installation script instead of RHEL repo 2022-12-01T10:18:54.1861844Z Thu Dec 1 10:18:54 2022 2022-12-01T10:18:54.1862589Z +-----------------------------------------------------------------------------+ 2022-12-01T10:18:54.1863116Z | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | 2022-12-01T10:18:54.1863608Z |-------------------------------+----------------------+----------------------+ 2022-12-01T10:18:54.1864119Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2022-12-01T10:18:54.1864600Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2022-12-01T10:18:54.1864966Z | | | MIG M. | 2022-12-01T10:18:54.1865258Z |===============================+======================+======================| 2022-12-01T10:18:54.1909622Z | 0 Tesla M60 Off | 00000000:00:1D.0 Off | 12325814270 | 2022-12-01T10:18:54.1909961Z | N/A 33C P0 39W / 150W | 0MiB / 7680MiB | 0% Default | 2022-12-01T10:18:54.1910267Z | | | N/A | 2022-12-01T10:18:54.1910733Z +-------------------------------+----------------------+----------------------+ 2022-12-01T10:18:54.1955912Z | 1 Tesla M60 Off | 00000000:00:1E.0 Off | 8030816435 | 2022-12-01T10:18:54.1956236Z | N/A 28C P0 39W / 150W | 0MiB / 7680MiB | 70% Default | 2022-12-01T10:18:54.1956551Z | | | N/A | 2022-12-01T10:18:54.1957285Z +-------------------------------+----------------------+----------------------+ 2022-12-01T10:18:54.1957660Z 2022-12-01T10:18:54.1958224Z +-----------------------------------------------------------------------------+ 2022-12-01T10:18:54.1958602Z | Processes: | 2022-12-01T10:18:54.1958950Z | GPU GI CI PID Type Process name GPU Memory | 2022-12-01T10:18:54.1959272Z | ID ID Usage | 2022-12-01T10:18:54.1959567Z |=============================================================================| 2022-12-01T10:18:54.1962171Z | No running processes found | 2022-12-01T10:18:54.1963046Z +-----------------------------------------------------------------------------+ 2022-12-01T10:18:54.2494174Z == Installing nvidia container toolkit for amzn2 == 2022-12-01T10:18:54.2498150Z + sudo yum install -y yum-utils 2022-12-01T10:18:54.8069812Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-12-01T10:18:56.5478755Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version 2022-12-01T10:18:56.5479169Z Nothing to do 2022-12-01T10:18:56.6307785Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2022-12-01T10:18:57.2096123Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-12-01T10:18:57.2656955Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2022-12-01T10:18:57.2657648Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo 2022-12-01T10:18:57.2658172Z repo saved to /etc/yum.repos.d/nvidia-docker.repo 2022-12-01T10:18:57.2811754Z + sudo yum install -y nvidia-docker2 2022-12-01T10:18:57.8443732Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-12-01T10:18:59.1969878Z Package nvidia-docker2-2.11.0-1.noarch already installed and latest version 2022-12-01T10:18:59.1970287Z Nothing to do 2022-12-01T10:18:59.2799378Z + sudo systemctl restart docker 2022-12-01T10:19:18.6735879Z + echo 'GPU_FLAG=--gpus all' 2022-12-01T10:19:19.2930086Z Command completed after 1 attempt(s). 2022-12-01T10:19:19.2930548Z 2022-12-01T10:19:19.2933437Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T10:19:19.3008401Z ##[group]Run python3 -m pip install psutil==5.9.1 2022-12-01T10:19:19.3008803Z python3 -m pip install psutil==5.9.1 2022-12-01T10:19:19.3009110Z python3 -m pip install pynvml==11.4.1 2022-12-01T10:19:19.3009462Z python3 -m tools.stats.monitor > usage_log.txt 2>&1 & 2022-12-01T10:19:19.3009842Z echo "::set-output name=monitor-script-pid::${!}" 2022-12-01T10:19:19.3023112Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:19:19.3023401Z env: 2022-12-01T10:19:19.3023643Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:19.3023901Z GPU_FLAG: --gpus all 2022-12-01T10:19:19.3024133Z ##[endgroup] 2022-12-01T10:19:19.6009308Z Defaulting to user installation because normal site-packages is not writeable 2022-12-01T10:19:19.6239082Z Requirement already satisfied: psutil==5.9.1 in /home/ec2-user/.local/lib/python3.7/site-packages (5.9.1) 2022-12-01T10:19:20.2027747Z Defaulting to user installation because normal site-packages is not writeable 2022-12-01T10:19:20.2259114Z Requirement already satisfied: pynvml==11.4.1 in /home/ec2-user/.local/lib/python3.7/site-packages (11.4.1) 2022-12-01T10:19:20.5123457Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T10:19:20.5172082Z Prepare all required actions 2022-12-01T10:19:20.5172572Z Getting action download info 2022-12-01T10:19:20.6995299Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:4a8bfae15cc25cc0785c1603ee87a9da8fd442ea) 2022-12-01T10:19:20.9604993Z Download action repository 'actions/download-artifact@v2' (SHA:f023be2c48cc18debc3bacd34cb396e0295e2869) 2022-12-01T10:19:21.0852086Z ##[group]Run ./.github/actions/download-build-artifacts 2022-12-01T10:19:21.0852396Z with: 2022-12-01T10:19:21.0852677Z name: linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T10:19:21.0852937Z env: 2022-12-01T10:19:21.0853174Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:21.0853437Z GPU_FLAG: --gpus all 2022-12-01T10:19:21.0853668Z ##[endgroup] 2022-12-01T10:19:21.0887364Z ##[group]Run seemethere/download-artifact-s3@v4 2022-12-01T10:19:21.0887659Z with: 2022-12-01T10:19:21.0887922Z name: linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T10:19:21.0888231Z s3-bucket: gha-artifacts 2022-12-01T10:19:21.0888540Z region: us-east-1 2022-12-01T10:19:21.0888776Z env: 2022-12-01T10:19:21.0889024Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:21.0889271Z GPU_FLAG: --gpus all 2022-12-01T10:19:21.0889520Z ##[endgroup] 2022-12-01T10:19:21.6812714Z Found 1 objects with prefix pytorch/pytorch/3591403534/linux-bionic-cuda11.6-py3.10-gcc7/ 2022-12-01T10:19:21.6813342Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2022-12-01T10:19:35.9068570Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2022-12-01T10:19:35.9069155Z 2022-12-01T10:19:35.9074094Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T10:19:35.9077343Z Artifact download has finished successfully 2022-12-01T10:19:35.9287124Z ##[group]Run unzip -o artifacts.zip 2022-12-01T10:19:35.9287459Z unzip -o artifacts.zip 2022-12-01T10:19:35.9300836Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:19:35.9301142Z env: 2022-12-01T10:19:35.9301385Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:35.9301640Z GPU_FLAG: --gpus all 2022-12-01T10:19:35.9301891Z ##[endgroup] 2022-12-01T10:19:35.9345304Z Archive: artifacts.zip 2022-12-01T10:19:35.9347805Z creating: dist/ 2022-12-01T10:19:38.0051836Z inflating: dist/torch-1.13.0a0+gitc13d400-cp310-cp310-linux_x86_64.whl 2022-12-01T10:19:38.0052283Z creating: build/custom_test_artifacts/ 2022-12-01T10:19:38.0052696Z creating: build/custom_test_artifacts/custom-op-build/ 2022-12-01T10:19:38.0053178Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2022-12-01T10:19:38.0059861Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log 2022-12-01T10:19:38.0060406Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/ 2022-12-01T10:19:38.0060960Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2022-12-01T10:19:38.0061543Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/ 2022-12-01T10:19:38.0062098Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2022-12-01T10:19:38.0064592Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2022-12-01T10:19:38.0065749Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2022-12-01T10:19:38.0066322Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2022-12-01T10:19:38.0066894Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2022-12-01T10:19:38.0069776Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-12-01T10:19:38.0071172Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2022-12-01T10:19:38.0072614Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2022-12-01T10:19:38.0073444Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2022-12-01T10:19:38.0075505Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2022-12-01T10:19:38.0076443Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2022-12-01T10:19:38.0077083Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2022-12-01T10:19:38.0077667Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2022-12-01T10:19:38.0132183Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-12-01T10:19:38.0132935Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-12-01T10:19:38.0133679Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-12-01T10:19:38.0134693Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-12-01T10:19:38.0135446Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-12-01T10:19:38.0136152Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-12-01T10:19:38.0136865Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2022-12-01T10:19:38.0137573Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-12-01T10:19:38.0138583Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-12-01T10:19:38.0180683Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-12-01T10:19:38.0222018Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-12-01T10:19:38.0223120Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-12-01T10:19:38.0223902Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2022-12-01T10:19:38.0224562Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-12-01T10:19:38.0225402Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-12-01T10:19:38.0226476Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-12-01T10:19:38.0227374Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2022-12-01T10:19:38.0229500Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-12-01T10:19:38.0302888Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2022-12-01T10:19:38.0376210Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2022-12-01T10:19:38.0376860Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2022-12-01T10:19:38.0377407Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2022-12-01T10:19:38.0378417Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log 2022-12-01T10:19:38.0379089Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2022-12-01T10:19:38.0379637Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2022-12-01T10:19:38.0380210Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2022-12-01T10:19:38.0380830Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2022-12-01T10:19:38.0381439Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2022-12-01T10:19:38.0382163Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2022-12-01T10:19:38.0382899Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2022-12-01T10:19:38.0383900Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2022-12-01T10:19:38.0384714Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2022-12-01T10:19:38.0385449Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2022-12-01T10:19:38.0386042Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2022-12-01T10:19:38.0407364Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2022-12-01T10:19:38.0521101Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2022-12-01T10:19:38.0521683Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2022-12-01T10:19:38.0522285Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2022-12-01T10:19:38.0523478Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2022-12-01T10:19:38.0524092Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2022-12-01T10:19:38.0524700Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2022-12-01T10:19:38.0525312Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2022-12-01T10:19:38.0526153Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2022-12-01T10:19:38.0526784Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2022-12-01T10:19:38.0527563Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2022-12-01T10:19:38.0528175Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2022-12-01T10:19:38.0549246Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2022-12-01T10:19:38.0631338Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2022-12-01T10:19:38.0631975Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-12-01T10:19:38.0632585Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2022-12-01T10:19:38.0633157Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2022-12-01T10:19:38.0633908Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2022-12-01T10:19:38.0635056Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2022-12-01T10:19:38.0635597Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2022-12-01T10:19:38.0638664Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2022-12-01T10:19:38.0639494Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2022-12-01T10:19:38.0640230Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2022-12-01T10:19:38.0733168Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2022-12-01T10:19:38.0795400Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2022-12-01T10:19:38.0795880Z creating: build/custom_test_artifacts/jit-hook-build/ 2022-12-01T10:19:38.0796328Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2022-12-01T10:19:38.0802990Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log 2022-12-01T10:19:38.0803804Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/ 2022-12-01T10:19:38.0804371Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2022-12-01T10:19:38.0804923Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/ 2022-12-01T10:19:38.0805481Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2022-12-01T10:19:38.0807873Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2022-12-01T10:19:38.0810058Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2022-12-01T10:19:38.0810614Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2022-12-01T10:19:38.0811176Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2022-12-01T10:19:38.0813303Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-12-01T10:19:38.0814433Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2022-12-01T10:19:38.0816466Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2022-12-01T10:19:38.0817106Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2022-12-01T10:19:38.0818408Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2022-12-01T10:19:38.0819583Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2022-12-01T10:19:38.0820155Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2022-12-01T10:19:38.0820724Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2022-12-01T10:19:38.0875751Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-12-01T10:19:38.0876481Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-12-01T10:19:38.0877202Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-12-01T10:19:38.0877958Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-12-01T10:19:38.0879115Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-12-01T10:19:38.0879829Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-12-01T10:19:38.0880710Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2022-12-01T10:19:38.0881578Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-12-01T10:19:38.0882914Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-12-01T10:19:38.0925543Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-12-01T10:19:38.0967438Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-12-01T10:19:38.0968511Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-12-01T10:19:38.0969391Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2022-12-01T10:19:38.0970257Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-12-01T10:19:38.0970899Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-12-01T10:19:38.0971915Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-12-01T10:19:38.0972831Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2022-12-01T10:19:38.0974960Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-12-01T10:19:38.1048705Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2022-12-01T10:19:38.1121782Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2022-12-01T10:19:38.1123161Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2022-12-01T10:19:38.1123709Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2022-12-01T10:19:38.1124638Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log 2022-12-01T10:19:38.1125366Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2022-12-01T10:19:38.1125932Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2022-12-01T10:19:38.1126534Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2022-12-01T10:19:38.1127166Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2022-12-01T10:19:38.1127759Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2022-12-01T10:19:38.1128582Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2022-12-01T10:19:38.1129187Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2022-12-01T10:19:38.1130472Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2022-12-01T10:19:38.1131250Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2022-12-01T10:19:38.1131888Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2022-12-01T10:19:38.1132479Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2022-12-01T10:19:38.1153605Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2022-12-01T10:19:38.1217172Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2022-12-01T10:19:38.1217816Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-12-01T10:19:38.1218407Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2022-12-01T10:19:38.1218979Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2022-12-01T10:19:38.1219884Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2022-12-01T10:19:38.1221104Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2022-12-01T10:19:38.1221748Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2022-12-01T10:19:38.1224674Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2022-12-01T10:19:38.1225391Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2022-12-01T10:19:38.1226296Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2022-12-01T10:19:38.1275832Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2022-12-01T10:19:38.1276331Z creating: build/custom_test_artifacts/custom-backend-build/ 2022-12-01T10:19:38.1276830Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2022-12-01T10:19:38.1283935Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log 2022-12-01T10:19:38.1284503Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/ 2022-12-01T10:19:38.1285089Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2022-12-01T10:19:38.1285679Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/ 2022-12-01T10:19:38.1286473Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2022-12-01T10:19:38.1289543Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2022-12-01T10:19:38.1290735Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2022-12-01T10:19:38.1291333Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2022-12-01T10:19:38.1291925Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2022-12-01T10:19:38.1294640Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-12-01T10:19:38.1295821Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2022-12-01T10:19:38.1297654Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2022-12-01T10:19:38.1298313Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2022-12-01T10:19:38.1299912Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2022-12-01T10:19:38.1300960Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2022-12-01T10:19:38.1301577Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2022-12-01T10:19:38.1302175Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2022-12-01T10:19:38.1356720Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-12-01T10:19:38.1357480Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-12-01T10:19:38.1358235Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-12-01T10:19:38.1358998Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-12-01T10:19:38.1359753Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-12-01T10:19:38.1360473Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-12-01T10:19:38.1361346Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2022-12-01T10:19:38.1362182Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-12-01T10:19:38.1363267Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-12-01T10:19:38.1405366Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-12-01T10:19:38.1446774Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-12-01T10:19:38.1447763Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-12-01T10:19:38.1448665Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2022-12-01T10:19:38.1449356Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-12-01T10:19:38.1450163Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-12-01T10:19:38.1451124Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-12-01T10:19:38.1452173Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2022-12-01T10:19:38.1454248Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-12-01T10:19:38.1527328Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2022-12-01T10:19:38.1600369Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2022-12-01T10:19:38.1601055Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2022-12-01T10:19:38.1601635Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2022-12-01T10:19:38.1602635Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log 2022-12-01T10:19:38.1603246Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2022-12-01T10:19:38.1603831Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2022-12-01T10:19:38.1604442Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2022-12-01T10:19:38.1605106Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2022-12-01T10:19:38.1605751Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2022-12-01T10:19:38.1606514Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2022-12-01T10:19:38.1607275Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2022-12-01T10:19:38.1608401Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2022-12-01T10:19:38.1609191Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2022-12-01T10:19:38.1609841Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2022-12-01T10:19:38.1610485Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2022-12-01T10:19:38.1615356Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2022-12-01T10:19:38.1763154Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2022-12-01T10:19:38.1764166Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2022-12-01T10:19:38.1764818Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2022-12-01T10:19:38.1765499Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2022-12-01T10:19:38.1766147Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2022-12-01T10:19:38.1766781Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2022-12-01T10:19:38.1767427Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2022-12-01T10:19:38.1768352Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2022-12-01T10:19:38.1769004Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2022-12-01T10:19:38.1769660Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2022-12-01T10:19:38.1770310Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2022-12-01T10:19:38.1791128Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2022-12-01T10:19:38.1849863Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2022-12-01T10:19:38.1850538Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-12-01T10:19:38.1851187Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2022-12-01T10:19:38.1851783Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2022-12-01T10:19:38.1852542Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2022-12-01T10:19:38.1853765Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2022-12-01T10:19:38.1855079Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2022-12-01T10:19:38.1858124Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2022-12-01T10:19:38.1858824Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2022-12-01T10:19:38.1859799Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2022-12-01T10:19:38.1978984Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2022-12-01T10:19:38.2024664Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2022-12-01T10:19:38.2025032Z creating: build/lib/ 2022-12-01T10:19:38.2025718Z inflating: build/lib/libclog.a 2022-12-01T10:19:38.2092402Z inflating: build/lib/libgtest.a 2022-12-01T10:19:38.2102615Z inflating: build/lib/libpthreadpool.a 2022-12-01T10:19:38.2209050Z inflating: build/lib/libprotobuf-lite.a 2022-12-01T10:19:38.2300529Z inflating: build/lib/libbenchmark.a 2022-12-01T10:19:38.2309707Z inflating: build/lib/libittnotify.a 2022-12-01T10:19:38.2387316Z inflating: build/lib/libasmjit.a 2022-12-01T10:19:38.2922156Z inflating: build/lib/libprotobuf.a 2022-12-01T10:19:38.2954361Z inflating: build/lib/libtensorpipe_uv.a 2022-12-01T10:19:38.3087739Z inflating: build/lib/libgloo.a 2022-12-01T10:19:38.3107199Z inflating: build/lib/libfmt.a 2022-12-01T10:19:38.3109083Z inflating: build/lib/libcaffe2_nvrtc.so 2022-12-01T10:19:38.3109688Z inflating: build/lib/libfoxi_loader.a 2022-12-01T10:19:38.3186913Z inflating: build/lib/libc10.so 2022-12-01T10:19:38.3188155Z inflating: build/lib/libtorch_global_deps.so 2022-12-01T10:19:38.3198191Z inflating: build/lib/libcpuinfo.a 2022-12-01T10:19:38.3769773Z inflating: build/lib/libprotoc.a 2022-12-01T10:19:38.3778938Z inflating: build/lib/libcpuinfo_internals.a 2022-12-01T10:19:38.3794551Z inflating: build/lib/libqnnpack.a 2022-12-01T10:19:38.3818728Z inflating: build/lib/libpytorch_qnnpack.a 2022-12-01T10:19:38.3821305Z inflating: build/lib/libnnpack_reference_layers.a 2022-12-01T10:19:38.3843710Z inflating: build/lib/libnnpack.a 2022-12-01T10:19:38.3862703Z inflating: build/lib/libgmock.a 2022-12-01T10:19:38.3863341Z inflating: build/lib/libgtest_main.a 2022-12-01T10:19:38.3864417Z inflating: build/lib/libbenchmark_main.a 2022-12-01T10:19:39.2005754Z inflating: build/lib/libdnnl.a 2022-12-01T10:19:39.2147534Z inflating: build/lib/libXNNPACK.a 2022-12-01T10:19:39.2804936Z inflating: build/lib/libtensorpipe.a 2022-12-01T10:19:39.2846958Z inflating: build/lib/libc10_cuda.so 2022-12-01T10:19:39.4392325Z inflating: build/lib/libfbgemm.a 2022-12-01T10:19:39.4392677Z inflating: build/lib/libgmock_main.a 2022-12-01T10:19:39.5526160Z inflating: build/lib/libdnnl_graph.a 2022-12-01T10:19:39.5949090Z inflating: build/lib/libkineto.a 2022-12-01T10:19:39.6239335Z inflating: build/lib/libtensorpipe_cuda.a 2022-12-01T10:19:39.6285037Z inflating: build/lib/libcaffe2_protos.a 2022-12-01T10:19:39.6333156Z inflating: build/lib/libonnx_proto.a 2022-12-01T10:19:39.7012740Z inflating: build/lib/libonnx.a 2022-12-01T10:19:39.7446244Z inflating: build/lib/libgloo_cuda.a 2022-12-01T10:19:42.0696704Z inflating: build/lib/libtorch_cpu.so 2022-12-01T10:19:42.4134473Z inflating: build/lib/libtorch_cuda_cpp.so 2022-12-01T10:19:44.1855349Z inflating: build/lib/libtorch_cuda_cu.so 2022-12-01T10:19:44.1856207Z inflating: build/lib/libtorch_cuda.so 2022-12-01T10:19:44.1857900Z inflating: build/lib/libtorch.so 2022-12-01T10:19:45.1762213Z inflating: build/lib/libtorch_cuda_linalg.so 2022-12-01T10:19:45.1765130Z inflating: build/lib/libc10d_cuda_test.so 2022-12-01T10:19:45.1788943Z inflating: build/lib/libjitbackend_test.so 2022-12-01T10:19:45.1849385Z inflating: build/lib/libtorchbind_test.so 2022-12-01T10:19:45.1879968Z inflating: build/lib/libbackend_with_compiler.so 2022-12-01T10:19:45.1885020Z inflating: build/lib/libshm.so 2022-12-01T10:19:45.3672396Z inflating: build/lib/libtorch_python.so 2022-12-01T10:19:45.3712058Z inflating: build/lib/libnnapi_backend.so 2022-12-01T10:19:45.3712355Z creating: build/bin/ 2022-12-01T10:19:45.3765301Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2022-12-01T10:19:45.3820581Z inflating: build/bin/c10_DeviceGuard_test 2022-12-01T10:19:45.3874239Z inflating: build/bin/c10_Device_test 2022-12-01T10:19:45.3938559Z inflating: build/bin/c10_DispatchKeySet_test 2022-12-01T10:19:45.3988825Z inflating: build/bin/c10_StreamGuard_test 2022-12-01T10:19:45.4042413Z inflating: build/bin/c10_SymInt_test 2022-12-01T10:19:45.4102219Z inflating: build/bin/c10_InlineDeviceGuard_test 2022-12-01T10:19:45.4161978Z inflating: build/bin/c10_InlineStreamGuard_test 2022-12-01T10:19:45.4223024Z inflating: build/bin/c10_SizesAndStrides_test 2022-12-01T10:19:45.4274272Z inflating: build/bin/c10_Array_test 2022-12-01T10:19:45.4331054Z inflating: build/bin/c10_Bitset_test 2022-12-01T10:19:45.4382704Z inflating: build/bin/c10_ConstexprCrc_test 2022-12-01T10:19:45.4437415Z inflating: build/bin/c10_C++17_test 2022-12-01T10:19:45.4490119Z inflating: build/bin/c10_DeadlockDetection_test 2022-12-01T10:19:45.4542998Z inflating: build/bin/c10_Half_test 2022-12-01T10:19:45.4603643Z inflating: build/bin/c10_LeftRight_test 2022-12-01T10:19:45.4670938Z inflating: build/bin/c10_Metaprogramming_test 2022-12-01T10:19:45.4827341Z inflating: build/bin/c10_SmallVectorTest 2022-12-01T10:19:45.4881274Z inflating: build/bin/c10_Synchronized_test 2022-12-01T10:19:45.4942472Z inflating: build/bin/c10_ThreadLocal_test 2022-12-01T10:19:45.4998575Z inflating: build/bin/c10_TypeIndex_test 2022-12-01T10:19:45.5050151Z inflating: build/bin/c10_TypeTraits_test 2022-12-01T10:19:45.5105241Z inflating: build/bin/c10_accumulate_test 2022-12-01T10:19:45.5159018Z inflating: build/bin/c10_TypeList_test 2022-12-01T10:19:45.5218791Z inflating: build/bin/c10_bfloat16_test 2022-12-01T10:19:45.5276359Z inflating: build/bin/c10_complex_math_test 2022-12-01T10:19:45.5336513Z inflating: build/bin/c10_complex_test 2022-12-01T10:19:45.5454651Z inflating: build/bin/c10_either_test 2022-12-01T10:19:45.5510477Z inflating: build/bin/c10_exception_test 2022-12-01T10:19:45.5564204Z inflating: build/bin/c10_flags_test 2022-12-01T10:19:45.5618234Z inflating: build/bin/c10_irange_test 2022-12-01T10:19:45.5800996Z inflating: build/bin/c10_intrusive_ptr_test 2022-12-01T10:19:45.5863163Z inflating: build/bin/c10_logging_test 2022-12-01T10:19:45.5943852Z inflating: build/bin/c10_optional_test 2022-12-01T10:19:45.6010208Z inflating: build/bin/c10_ordered_preserving_dict_test 2022-12-01T10:19:45.6068702Z inflating: build/bin/c10_registry_test 2022-12-01T10:19:45.6132122Z inflating: build/bin/c10_string_view_test 2022-12-01T10:19:45.6187370Z inflating: build/bin/c10_tempfile_test 2022-12-01T10:19:45.6247895Z inflating: build/bin/c10_typeid_test 2022-12-01T10:19:45.6308193Z inflating: build/bin/c10_intrusive_ptr_benchmark 2022-12-01T10:19:45.6832156Z inflating: build/bin/protoc-3.13.0.0 2022-12-01T10:19:45.7354830Z inflating: build/bin/protoc 2022-12-01T10:19:45.7406774Z inflating: build/bin/c10_cuda_CUDATest 2022-12-01T10:19:45.7725060Z inflating: build/bin/vec_test_all_types_DEFAULT 2022-12-01T10:19:45.8080327Z inflating: build/bin/vec_test_all_types_AVX2 2022-12-01T10:19:45.8137776Z inflating: build/bin/HashStoreTest 2022-12-01T10:19:45.8202379Z inflating: build/bin/TCPStoreTest 2022-12-01T10:19:45.8260001Z inflating: build/bin/FileStoreTest 2022-12-01T10:19:45.8275746Z inflating: build/bin/ProcessGroupMPITest 2022-12-01T10:19:45.8279015Z inflating: build/bin/example_allreduce 2022-12-01T10:19:45.8335306Z inflating: build/bin/Dimname_test 2022-12-01T10:19:45.8414175Z inflating: build/bin/Dict_test 2022-12-01T10:19:45.8482337Z inflating: build/bin/MaybeOwned_test 2022-12-01T10:19:45.8545886Z inflating: build/bin/apply_utils_test 2022-12-01T10:19:45.8606987Z inflating: build/bin/NamedTensor_test 2022-12-01T10:19:45.8670252Z inflating: build/bin/atest 2022-12-01T10:19:45.8728455Z inflating: build/bin/broadcast_test 2022-12-01T10:19:45.8794059Z inflating: build/bin/basic 2022-12-01T10:19:45.8856897Z inflating: build/bin/cpu_generator_test 2022-12-01T10:19:45.8912811Z inflating: build/bin/cpu_profiling_allocator_test 2022-12-01T10:19:45.8966271Z inflating: build/bin/dispatch_key_set_test 2022-12-01T10:19:45.9061168Z inflating: build/bin/cpu_rng_test 2022-12-01T10:19:45.9114187Z inflating: build/bin/dlconvertor_test 2022-12-01T10:19:45.9176864Z inflating: build/bin/extension_backend_test 2022-12-01T10:19:45.9236940Z inflating: build/bin/half_test 2022-12-01T10:19:45.9289831Z inflating: build/bin/lazy_tensor_test 2022-12-01T10:19:45.9391728Z inflating: build/bin/ivalue_test 2022-12-01T10:19:45.9449133Z inflating: build/bin/math_kernel_test 2022-12-01T10:19:45.9506379Z inflating: build/bin/memory_format_test 2022-12-01T10:19:45.9562796Z inflating: build/bin/memory_overlapping_test 2022-12-01T10:19:45.9617121Z inflating: build/bin/operator_name_test 2022-12-01T10:19:45.9677456Z inflating: build/bin/native_test 2022-12-01T10:19:45.9733449Z inflating: build/bin/mobile_memory_cleanup 2022-12-01T10:19:45.9786812Z inflating: build/bin/operators_test 2022-12-01T10:19:45.9842867Z inflating: build/bin/packedtensoraccessor_test 2022-12-01T10:19:45.9913308Z inflating: build/bin/pow_test 2022-12-01T10:19:45.9974934Z inflating: build/bin/quantized_test 2022-12-01T10:19:46.0029039Z inflating: build/bin/reportMemoryUsage_test 2022-12-01T10:19:46.0081799Z inflating: build/bin/reduce_ops_test 2022-12-01T10:19:46.0142420Z inflating: build/bin/scalar_tensor_test 2022-12-01T10:19:46.0205567Z inflating: build/bin/scalar_test 2022-12-01T10:19:46.0261040Z inflating: build/bin/stride_properties_test 2022-12-01T10:19:46.0263680Z inflating: build/bin/thread_init_test 2022-12-01T10:19:46.0348341Z inflating: build/bin/tensor_iterator_test 2022-12-01T10:19:46.0408189Z inflating: build/bin/type_ptr_test 2022-12-01T10:19:46.0468039Z inflating: build/bin/test_parallel 2022-12-01T10:19:46.0533037Z inflating: build/bin/type_test 2022-12-01T10:19:46.0585934Z inflating: build/bin/variant_test 2022-12-01T10:19:46.0641512Z inflating: build/bin/undefined_tensor_test 2022-12-01T10:19:46.0716246Z inflating: build/bin/vmap_test 2022-12-01T10:19:46.0718621Z inflating: build/bin/verify_api_visibility 2022-12-01T10:19:46.0773285Z inflating: build/bin/weakref_test 2022-12-01T10:19:46.0827447Z inflating: build/bin/wrapdim_test 2022-12-01T10:19:46.0892312Z inflating: build/bin/IListRef_test 2022-12-01T10:19:46.0944393Z inflating: build/bin/xla_tensor_test 2022-12-01T10:19:46.1063494Z inflating: build/bin/List_test 2022-12-01T10:19:46.1133464Z inflating: build/bin/KernelFunction_test 2022-12-01T10:19:46.1265126Z inflating: build/bin/kernel_function_legacy_test 2022-12-01T10:19:46.1369378Z inflating: build/bin/kernel_function_test 2022-12-01T10:19:46.1508287Z inflating: build/bin/kernel_lambda_legacy_test 2022-12-01T10:19:46.1620126Z inflating: build/bin/kernel_lambda_test 2022-12-01T10:19:46.1684678Z inflating: build/bin/kernel_stackbased_test 2022-12-01T10:19:46.1788164Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2022-12-01T10:19:46.1842218Z inflating: build/bin/CppSignature_test 2022-12-01T10:19:46.1893624Z inflating: build/bin/op_allowlist_test 2022-12-01T10:19:46.1950047Z inflating: build/bin/inline_container_test 2022-12-01T10:19:46.2264426Z inflating: build/bin/op_registration_test 2022-12-01T10:19:46.2324951Z inflating: build/bin/backend_fallback_test 2022-12-01T10:19:46.2381041Z inflating: build/bin/cuda_apply_test 2022-12-01T10:19:46.2438403Z inflating: build/bin/cuda_caching_host_allocator_test 2022-12-01T10:19:46.2503524Z inflating: build/bin/cuda_atomic_ops_test 2022-12-01T10:19:46.2576593Z inflating: build/bin/cuda_complex_math_test 2022-12-01T10:19:46.2629469Z inflating: build/bin/cuda_device_test 2022-12-01T10:19:46.2692384Z inflating: build/bin/cuda_complex_test 2022-12-01T10:19:46.2756307Z inflating: build/bin/cuda_cub_test 2022-12-01T10:19:46.2809928Z inflating: build/bin/cuda_dlconvertor_test 2022-12-01T10:19:46.2864112Z inflating: build/bin/cuda_integer_divider_test 2022-12-01T10:19:46.2936432Z inflating: build/bin/cuda_distributions_test 2022-12-01T10:19:46.2999072Z inflating: build/bin/cuda_generator_test 2022-12-01T10:19:46.3052180Z inflating: build/bin/cuda_half_test 2022-12-01T10:19:46.3104365Z inflating: build/bin/cuda_optional_test 2022-12-01T10:19:46.3160319Z inflating: build/bin/cuda_reportMemoryUsage_test 2022-12-01T10:19:46.3225681Z inflating: build/bin/cuda_stream_test 2022-12-01T10:19:46.3280326Z inflating: build/bin/cuda_packedtensoraccessor_test 2022-12-01T10:19:46.3332329Z inflating: build/bin/cuda_cudnn_test 2022-12-01T10:19:46.3388319Z inflating: build/bin/cuda_vectorized_test 2022-12-01T10:19:46.3406159Z inflating: build/bin/tutorial_tensorexpr 2022-12-01T10:19:46.3475837Z inflating: build/bin/ProcessGroupGlooTest 2022-12-01T10:19:46.3538269Z inflating: build/bin/ProcessGroupGlooAsyncTest 2022-12-01T10:19:46.3604497Z inflating: build/bin/ProcessGroupNCCLTest 2022-12-01T10:19:46.3666788Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2022-12-01T10:19:46.3723755Z inflating: build/bin/ProcessGroupUCCTest 2022-12-01T10:19:46.3781884Z inflating: build/bin/test_dist_autograd 2022-12-01T10:19:46.3784547Z inflating: build/bin/parallel_benchmark 2022-12-01T10:19:46.3859582Z inflating: build/bin/test_cpp_rpc 2022-12-01T10:19:46.3933750Z inflating: build/bin/test_mobile_nnc 2022-12-01T10:19:46.4855741Z inflating: build/bin/test_tensorexpr 2022-12-01T10:19:46.4867034Z inflating: build/bin/aot_model_compiler_test 2022-12-01T10:19:46.5252104Z inflating: build/bin/test_lazy 2022-12-01T10:19:46.5257732Z inflating: build/bin/torch_shm_manager 2022-12-01T10:19:46.6583150Z inflating: build/bin/test_api 2022-12-01T10:19:46.7755836Z inflating: build/bin/test_jit 2022-12-01T10:19:46.7758003Z inflating: .pytorch-test-times.json 2022-12-01T10:19:46.7789213Z ##[group]Run df -H 2022-12-01T10:19:46.7789476Z df -H 2022-12-01T10:19:46.7803124Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T10:19:46.7803431Z env: 2022-12-01T10:19:46.7803686Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:46.7803942Z GPU_FLAG: --gpus all 2022-12-01T10:19:46.7804189Z ##[endgroup] 2022-12-01T10:19:46.7845111Z Filesystem Size Used Avail Use% Mounted on 2022-12-01T10:19:46.7845744Z devtmpfs 129G 0 129G 0% /dev 2022-12-01T10:19:46.7846315Z tmpfs 129G 13M 129G 1% /dev/shm 2022-12-01T10:19:46.7846817Z tmpfs 129G 553k 129G 1% /run 2022-12-01T10:19:46.7847379Z tmpfs 129G 0 129G 0% /sys/fs/cgroup 2022-12-01T10:19:46.7847973Z /dev/xvda1 162G 30G 132G 19% / 2022-12-01T10:19:46.7882969Z ##[group]Run .github/scripts/parse_ref.py 2022-12-01T10:19:46.7883328Z .github/scripts/parse_ref.py 2022-12-01T10:19:46.7895796Z shell: /usr/bin/bash -e {0} 2022-12-01T10:19:46.7896054Z env: 2022-12-01T10:19:46.7896281Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:46.7896555Z GPU_FLAG: --gpus all 2022-12-01T10:19:46.7896806Z ##[endgroup] 2022-12-01T10:19:46.8135961Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T10:19:46.8196395Z ##[group]Run set -x 2022-12-01T10:19:46.8196784Z set -x 2022-12-01T10:19:46.8197013Z  2022-12-01T10:19:46.8197269Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2022-12-01T10:19:46.8197622Z  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh 2022-12-01T10:19:46.8197975Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2022-12-01T10:19:46.8198280Z  TEST_COMMAND=.jenkins/caffe2/test.sh 2022-12-01T10:19:46.8198555Z else 2022-12-01T10:19:46.8198837Z  TEST_COMMAND=.jenkins/pytorch/test.sh 2022-12-01T10:19:46.8199095Z fi 2022-12-01T10:19:46.8199316Z  2022-12-01T10:19:46.8199636Z COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") 2022-12-01T10:19:46.8199945Z  2022-12-01T10:19:46.8200242Z # sanitize the input commit message and PR body here: 2022-12-01T10:19:46.8200536Z # 2022-12-01T10:19:46.8200896Z # trim all new lines from commit messages + PR_BODY to avoid issues with batch environment 2022-12-01T10:19:46.8201398Z # variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028 2022-12-01T10:19:46.8201821Z COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}" 2022-12-01T10:19:46.8202136Z PR_BODY="${PR_BODY//[$'\n\r']}" 2022-12-01T10:19:46.8202693Z  2022-12-01T10:19:46.8203083Z # then trim all special characters like single and double quotes to avoid unescaped inputs to 2022-12-01T10:19:46.8203463Z # wreak havoc internally 2022-12-01T10:19:46.8203766Z export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}" 2022-12-01T10:19:46.8204095Z export PR_BODY="${PR_BODY//[\'\"]}" 2022-12-01T10:19:46.8204499Z  2022-12-01T10:19:46.8204812Z # detached container should get cleaned up by teardown_ec2_linux 2022-12-01T10:19:46.8205198Z # TODO: Stop building test binaries as part of the build phase 2022-12-01T10:19:46.8205569Z # Used for GPU_FLAG since that doesn't play nice 2022-12-01T10:19:46.8205901Z # shellcheck disable=SC2086,SC2090 2022-12-01T10:19:46.8206186Z container_name=$(docker run \ 2022-12-01T10:19:46.8206464Z  ${GPU_FLAG:-} \ 2022-12-01T10:19:46.8206735Z  -e BUILD_ENVIRONMENT \ 2022-12-01T10:19:46.8206992Z  -e PR_NUMBER \ 2022-12-01T10:19:46.8207260Z  -e GITHUB_ACTIONS \ 2022-12-01T10:19:46.8207519Z  -e BASE_SHA \ 2022-12-01T10:19:46.8207772Z  -e BRANCH \ 2022-12-01T10:19:46.8207997Z  -e SHA1 \ 2022-12-01T10:19:46.8208258Z  -e AWS_DEFAULT_REGION \ 2022-12-01T10:19:46.8208530Z  -e IN_WHEEL_TEST \ 2022-12-01T10:19:46.8208782Z  -e SHARD_NUMBER \ 2022-12-01T10:19:46.8209043Z  -e TEST_CONFIG \ 2022-12-01T10:19:46.8209309Z  -e NUM_TEST_SHARDS \ 2022-12-01T10:19:46.8209555Z  -e PR_BODY \ 2022-12-01T10:19:46.8209820Z  -e COMMIT_MESSAGES \ 2022-12-01T10:19:46.8210114Z  -e PYTORCH_RETRY_TEST_CASES \ 2022-12-01T10:19:46.8210411Z  -e PYTORCH_OVERRIDE_FLAKY_SIGNAL \ 2022-12-01T10:19:46.8210698Z  -e PR_LABELS \ 2022-12-01T10:19:46.8210986Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2022-12-01T10:19:46.8211261Z  -e SCCACHE_BUCKET \ 2022-12-01T10:19:46.8211542Z  -e SCCACHE_S3_KEY_PREFIX \ 2022-12-01T10:19:46.8211810Z  -e XLA_CUDA \ 2022-12-01T10:19:46.8212093Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2022-12-01T10:19:46.8212418Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2022-12-01T10:19:46.8212744Z  --ulimit stack=10485760:83886080 \ 2022-12-01T10:19:46.8213064Z  --security-opt seccomp=unconfined \ 2022-12-01T10:19:46.8213355Z  --cap-add=SYS_PTRACE \ 2022-12-01T10:19:46.8213623Z  --ipc=host \ 2022-12-01T10:19:46.8213987Z  --shm-size="${SHM_SIZE}" \ 2022-12-01T10:19:46.8214253Z  --tty \ 2022-12-01T10:19:46.8214492Z  --detach \ 2022-12-01T10:19:46.8214764Z  --name="${container_name}" \ 2022-12-01T10:19:46.8215023Z  --user jenkins \ 2022-12-01T10:19:46.8215348Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2022-12-01T10:19:46.8215688Z  -w /var/lib/jenkins/workspace \ 2022-12-01T10:19:46.8215971Z  "${DOCKER_IMAGE}" 2022-12-01T10:19:46.8216196Z ) 2022-12-01T10:19:46.8216563Z docker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2022-12-01T10:19:46.8228395Z shell: /usr/bin/bash -e {0} 2022-12-01T10:19:46.8228646Z env: 2022-12-01T10:19:46.8228885Z GIT_DEFAULT_BRANCH: master 2022-12-01T10:19:46.8229138Z GPU_FLAG: --gpus all 2022-12-01T10:19:46.8229463Z BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T10:19:46.8229780Z PR_NUMBER: 89997 2022-12-01T10:19:46.8230015Z BRANCH: pull/89997 2022-12-01T10:19:46.8230306Z SHA1: c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T10:19:46.8230635Z BASE_SHA: ae2fe4033cf3b17259b17f351020b988fa893f91 2022-12-01T10:19:46.8230923Z PYTORCH_RETRY_TEST_CASES: 1 2022-12-01T10:19:46.8231208Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2022-12-01T10:19:46.8231488Z TEST_CONFIG: distributed 2022-12-01T10:19:46.8231740Z SHARD_NUMBER: 3 2022-12-01T10:19:46.8231963Z NUM_TEST_SHARDS: 3 2022-12-01T10:19:46.8232827Z PR_BODY: Link to landed master PR (if applicable): https://github.com/pytorch/pytorch/pull/88993 Criteria category: 1: This prevents a crash, which was introduced [here](https://github.com/pytorch/pytorch/commit/4b7de265569f7fd731dd1cfea83ce804cc22f7c0#diff-45b117ca26b4fc9174fbe0d7a9cd8cb1c43964cd5e2bb20c7778ee00a942ef63), tagged for 1.13 2: Prevents a crash I'm hoping this is a low-risk change, since it's just changing one method for its safer form. 2022-12-01T10:19:46.8233814Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2022-12-01T10:19:46.8234144Z SCCACHE_S3_KEY_PREFIX: pull 2022-12-01T10:19:46.8234382Z SHM_SIZE: 2g 2022-12-01T10:19:46.8234864Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:19:46.8235326Z XLA_CUDA: 2022-12-01T10:19:46.8235672Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2022-12-01T10:19:46.8236008Z ##[endgroup] 2022-12-01T10:19:46.8264782Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2022-12-01T10:19:46.8265260Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *onnx* ]] 2022-12-01T10:19:46.8265605Z + TEST_COMMAND=.jenkins/pytorch/test.sh 2022-12-01T10:19:46.8268768Z ++ git cherry -v origin/master 2022-12-01T10:19:47.2943620Z + COMMIT_MESSAGES='+ 56508e29e6b1847e050462d818021a0389d0de99 Release 1.13, Install torch from test channel, Pin build… (#86290) 2022-12-01T10:19:47.2944513Z - f682048bf9475d324044e955d8d67464ee446c3a Fix for the binary upload (#86385) 2022-12-01T10:19:47.2945118Z + 95112ca043c37e38101e629c1f69663e911cc909 Fix binary builds for the release - unblock release (#86484) 2022-12-01T10:19:47.2945780Z - c38dbd0e1dd88cce783c8dce6cce1a97276b6bb9 Conditionally build the TestApp benchmark based on lite interpreter (#86314) (#86377) 2022-12-01T10:19:47.2946653Z - 50a9cd95baf29e136fdf5f027c6f4c4a7ccd5b9b Add version selector back to functorch docs (#86602) (#86689) 2022-12-01T10:19:47.2947821Z - 5be45fc4f1a86624fe5ce0e99a2a5558011ca69e [ROCm] set nvfuser default to disabled, keep CI (#86369) (#86725) 2022-12-01T10:19:47.2948429Z - f37023b03f3e92dadb247e89fd4e024eb4a0eb8a [MPS] Better error message for `slow_conv2d_forward` (#86844) 2022-12-01T10:19:47.2949431Z - 786431cd13d171965feeb17240cc7bf8dc509dba [DOC] Use type hints to show annotation in the docs (#86851) 2022-12-01T10:19:47.2950467Z - 03992a6fb3dbd371fe41d3cd95c6589b45a10f14 Make the data types of output and input consistenst for batchnorm (#86784) 2022-12-01T10:19:47.2951129Z + d9ddab5efa2224704e4b982e651589b31b1c1547 [1.13] Remove torch.vmap (#86333) 2022-12-01T10:19:47.2951739Z + 01042487e2d848d17351555ea73c61d38a60a7f9 [1.13] Release-only improvements to functorch docs (#86693) 2022-12-01T10:19:47.2952260Z - 71251e2521ac4b737af43d536dff359df256eadd ci: Just use regular checkout (#86824) (#86895) 2022-12-01T10:19:47.2952723Z + f65ac2ad20f8eab95a0d0fb93c3550afc2b86d91 [CI] Fix builder ref for release, linux only (#86904) 2022-12-01T10:19:47.2953346Z - 51c16f092ca8d7802decfd56d34ae99d1c0f335c [functorch] Add more details to the functorch install page (#86823) (#86903) 2022-12-01T10:19:47.2953996Z - 311b47b72c73509a82ed0b613a7c563e6628332b [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) (#86726) 2022-12-01T10:19:47.2954505Z + 7e196948ca0c5ede7fa6e0831fda18135e6573a5 Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) (#86902) 2022-12-01T10:19:47.2955159Z - aab841e89d850c9da9048a64541cfa78b222f1ff [ONNX] Support device().type() string comparison with constant (#86168) (#86921) 2022-12-01T10:19:47.2955769Z - de3fa48e38e5370d18748e131bd20db53390ea5c Install c10d headers with absolute path (#86257) (#86933) 2022-12-01T10:19:47.2956236Z + 366c59e2606368a6f2ee7165d58686d0d494dd92 Allow PrivateUse1 backends to not have Storage (#86557) (#86803) 2022-12-01T10:19:47.2956636Z + 492d572086a93a31fc8972d29726453ee1e46717 [functorch] fix cross (#86934) 2022-12-01T10:19:47.2957280Z - f89a762a8efcca77722071ec042a02742ec4537d Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516) 2022-12-01T10:19:47.2957939Z - c40a04415c59edc45b2ceba682377b255f2091c7 Add error checking to flaky test bot platform parser (#86632) (#87201) 2022-12-01T10:19:47.2958448Z + eda8263f151a17d0d81d33e3f48fb2102daf4c14 [Release 1.13.0][DataPipe] Documentation and interface fix (#87140) 2022-12-01T10:19:47.2959011Z + d2325f3b6a17e54b5d48c68d82b74354dfd35f61 handle libomp update on circleci (#86979) (#87132) 2022-12-01T10:19:47.2959592Z - 7688e5eeabce25b67aa99280772d094b1c5f7091 [ONNX] Fix triu/tril export with diagonal input (#86843) (#86925) 2022-12-01T10:19:47.2960184Z - 1ebff1dcaf75d0305dc713eea104ddfa3ae6b588 [ONNX] Renable assert diagnostic test (#85999) (#86924) 2022-12-01T10:19:47.2960783Z - 1442f04f3bab3b0e0f22fbb68c8f328e17801c69 Enables lazy loading of cuda modules if not set by user (#86509) 2022-12-01T10:19:47.2961240Z + d324341922beb064057e12daae19199480885d5e Bug fix for torch._C._willEngineExecuteNode (#86672) 2022-12-01T10:19:47.2961862Z - 894bad72c9dfb946e9bdfebaf2db41473b3dd4fe [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) (#86923) 2022-12-01T10:19:47.2962906Z - ef6dd0d34d379604eec8db34dd58581b8b061249 [ONNX] Ignore print(Tensor) during tracing (#86223) (#87122) 2022-12-01T10:19:47.2963521Z - e6d8c1392c817636c15ad974dd160292cdc471bc [einsum] Fix opt_einsum defaults to be more reasonable (#86985) (#87144) 2022-12-01T10:19:47.2964059Z + 3af7c7fc98af441ca90ef38b262497b530a317fb [einsum] fix MPS regression and fix incorrect contraction order when path is None (#87261) 2022-12-01T10:19:47.2964702Z - 9e3df4920f75fa2aaf5bbe04ed302fc80008a8b9 Improve NestedTensor documentation (#85186) (#87337) 2022-12-01T10:19:47.2965303Z - 3ca59f4b8aff32fbad3fcd13b812b5335047b8ce Reenable aot tests on windows for cuda 11.7 and up (#87193) (#87307) 2022-12-01T10:19:47.2965870Z - 0948dbb3e7d6e5e99305b71423f7e9a45a6e5e14 Add torch.sparse overview section (#85265) (#87376) 2022-12-01T10:19:47.2966485Z - 1f80ac7fa72cacafbb4901109b465db05f91d818 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) (#87379) 2022-12-01T10:19:47.2967128Z - 7342a3b0eac2e3ebf4d9b5ce1c3c73c4134c8751 Add prototype warning to MaskedTensor constructor (#87107) (#87380) 2022-12-01T10:19:47.2967703Z - aba5544affda233f8413b5fd74df1cdf7ed91c18 [maskedtensor] add docs (#84887) (#87381) 2022-12-01T10:19:47.2968415Z - 6e2cb22d74d1939e0aa5c00c665a58e8796bef31 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) (#87386) 2022-12-01T10:19:47.2969005Z + ca0ee922bc9440bfc86b332d69ee0559f748bc6e [release] Add warning to stateless.functional_call for deprecated behavior (#87079) 2022-12-01T10:19:47.2969645Z - d2ce4310bad94f2e30fd75cb14e175bcef7ba6fa Assert if padding mask type is unexpected (#87106) (#87384) 2022-12-01T10:19:47.2970245Z - 05cb998e11d109a55ddee0fb72c17aa07fa8104a [MPS] Do not dispatch empty job in `bitwise_not` (#87286) (#87427) 2022-12-01T10:19:47.2970780Z - 385022c76e4229a9393d63871225e0961e91c2db [ci] handle libomp upgrade on github (#87382) (#87408) 2022-12-01T10:19:47.2971327Z - 3ffddc0b2d5fc963463c98b518b0eb1f88788a74 [functorch] fix AOTAutograd tutorial (#87415) (#87434) 2022-12-01T10:19:47.2971776Z + 55c76baf579cb6593f87d1a23e9a49afeb55f15a [1.13] Fix functorch version docs (#87392) 2022-12-01T10:19:47.2972307Z - 59686b4c60289188436ac61576fd389821858fa5 [maskedtensor] fix docs formatting (#87387) (#87406) 2022-12-01T10:19:47.2972730Z + 0c0df0be7497112022804df03aeeb0fcbadc9243 Add `weights_only` option to `torch.load` (#87443) 2022-12-01T10:19:47.2973187Z + d253eb29d86af51ae17b950825ffdb5661b5af7f Avoid calling logging.basicConfig (#86959) (#87455) 2022-12-01T10:19:47.2986898Z - d3aecbd9bc58a42366abffa63748d73c872bd927 Delete torch::deploy from pytorch core (#85953) (#85953) (#87454) 2022-12-01T10:19:47.2987516Z - 51fa4fae41367a17f49ce648c7cdd6aa72f6e6ac Move PadNd from ATen/native to ATen (#87456) 2022-12-01T10:19:47.2988120Z - f6c42ae2c29ba788523149dbcc791bf14530f93d Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303) (#87463) 2022-12-01T10:19:47.2988798Z - 6a8be2cb630ce79d654a707e5ea454d013acbda1 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) (#87457) 2022-12-01T10:19:47.2989303Z + 8569a44f38ff906103f40c6925dec366c7557943 [MPS] Revamp copy_to_mps_ implementation (#87475) 2022-12-01T10:19:47.2990095Z - fdb18da4cc3ee04c2423eb33b2780edc4cae64e0 Fix distributed issue by including distributed files (#87612) 2022-12-01T10:19:47.2990666Z - 341c377c0f595ff4e4a0fbdef21978c038b64b98 Add General Project Policies (#87385) (#87613) 2022-12-01T10:19:47.2991175Z - 4e1a4b150a4ccd5f401a4293a0336a9a8a1dad2d fix docs push (#87498) (#87628) 2022-12-01T10:19:47.2991725Z - 7c98e70d44abc7a1aead68b6ea6c8adc8c554db5 attempted fix for nvrtc with lovelace (#87611) (#87618) 2022-12-01T10:19:47.2992308Z - 74a9ca993bd79f8131829e9c946657fa9a1d05ef [JIT][Security] Do not blindly eval input string (#89189) (#89925) 2022-12-01T10:19:47.2992746Z + ae2fe4033cf3b17259b17f351020b988fa893f91 Update masked.rst (#89758) (#89923) 2022-12-01T10:19:47.2993282Z - c13d400bffe90e16b96520bbc8a41a6f0c9cd584 Use the Python frame safely in _pythonCallstack' 2022-12-01T10:19:47.3009244Z + COMMIT_MESSAGES='+ 56508e29e6b1847e050462d818021a0389d0de99 Release 1.13, Install torch from test channel, Pin build… (#86290)- f682048bf9475d324044e955d8d67464ee446c3a Fix for the binary upload (#86385)+ 95112ca043c37e38101e629c1f69663e911cc909 Fix binary builds for the release - unblock release (#86484)- c38dbd0e1dd88cce783c8dce6cce1a97276b6bb9 Conditionally build the TestApp benchmark based on lite interpreter (#86314) (#86377)- 50a9cd95baf29e136fdf5f027c6f4c4a7ccd5b9b Add version selector back to functorch docs (#86602) (#86689)- 5be45fc4f1a86624fe5ce0e99a2a5558011ca69e [ROCm] set nvfuser default to disabled, keep CI (#86369) (#86725)- f37023b03f3e92dadb247e89fd4e024eb4a0eb8a [MPS] Better error message for `slow_conv2d_forward` (#86844)- 786431cd13d171965feeb17240cc7bf8dc509dba [DOC] Use type hints to show annotation in the docs (#86851)- 03992a6fb3dbd371fe41d3cd95c6589b45a10f14 Make the data types of output and input consistenst for batchnorm (#86784)+ d9ddab5efa2224704e4b982e651589b31b1c1547 [1.13] Remove torch.vmap (#86333)+ 01042487e2d848d17351555ea73c61d38a60a7f9 [1.13] Release-only improvements to functorch docs (#86693)- 71251e2521ac4b737af43d536dff359df256eadd ci: Just use regular checkout (#86824) (#86895)+ f65ac2ad20f8eab95a0d0fb93c3550afc2b86d91 [CI] Fix builder ref for release, linux only (#86904)- 51c16f092ca8d7802decfd56d34ae99d1c0f335c [functorch] Add more details to the functorch install page (#86823) (#86903)- 311b47b72c73509a82ed0b613a7c563e6628332b [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) (#86726)+ 7e196948ca0c5ede7fa6e0831fda18135e6573a5 Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) (#86902)- aab841e89d850c9da9048a64541cfa78b222f1ff [ONNX] Support device().type() string comparison with constant (#86168) (#86921)- de3fa48e38e5370d18748e131bd20db53390ea5c Install c10d headers with absolute path (#86257) (#86933)+ 366c59e2606368a6f2ee7165d58686d0d494dd92 Allow PrivateUse1 backends to not have Storage (#86557) (#86803)+ 492d572086a93a31fc8972d29726453ee1e46717 [functorch] fix cross (#86934)- f89a762a8efcca77722071ec042a02742ec4537d Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516)- c40a04415c59edc45b2ceba682377b255f2091c7 Add error checking to flaky test bot platform parser (#86632) (#87201)+ eda8263f151a17d0d81d33e3f48fb2102daf4c14 [Release 1.13.0][DataPipe] Documentation and interface fix (#87140)+ d2325f3b6a17e54b5d48c68d82b74354dfd35f61 handle libomp update on circleci (#86979) (#87132)- 7688e5eeabce25b67aa99280772d094b1c5f7091 [ONNX] Fix triu/tril export with diagonal input (#86843) (#86925)- 1ebff1dcaf75d0305dc713eea104ddfa3ae6b588 [ONNX] Renable assert diagnostic test (#85999) (#86924)- 1442f04f3bab3b0e0f22fbb68c8f328e17801c69 Enables lazy loading of cuda modules if not set by user (#86509)+ d324341922beb064057e12daae19199480885d5e Bug fix for torch._C._willEngineExecuteNode (#86672)- 894bad72c9dfb946e9bdfebaf2db41473b3dd4fe [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) (#86923)- ef6dd0d34d379604eec8db34dd58581b8b061249 [ONNX] Ignore print(Tensor) during tracing (#86223) (#87122)- e6d8c1392c817636c15ad974dd160292cdc471bc [einsum] Fix opt_einsum defaults to be more reasonable (#86985) (#87144)+ 3af7c7fc98af441ca90ef38b262497b530a317fb [einsum] fix MPS regression and fix incorrect contraction order when path is None (#87261)- 9e3df4920f75fa2aaf5bbe04ed302fc80008a8b9 Improve NestedTensor documentation (#85186) (#87337)- 3ca59f4b8aff32fbad3fcd13b812b5335047b8ce Reenable aot tests on windows for cuda 11.7 and up (#87193) (#87307)- 0948dbb3e7d6e5e99305b71423f7e9a45a6e5e14 Add torch.sparse overview section (#85265) (#87376)- 1f80ac7fa72cacafbb4901109b465db05f91d818 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) (#87379)- 7342a3b0eac2e3ebf4d9b5ce1c3c73c4134c8751 Add prototype warning to MaskedTensor constructor (#87107) (#87380)- aba5544affda233f8413b5fd74df1cdf7ed91c18 [maskedtensor] add docs (#84887) (#87381)- 6e2cb22d74d1939e0aa5c00c665a58e8796bef31 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) (#87386)+ ca0ee922bc9440bfc86b332d69ee0559f748bc6e [release] Add warning to stateless.functional_call for deprecated behavior (#87079)- d2ce4310bad94f2e30fd75cb14e175bcef7ba6fa Assert if padding mask type is unexpected (#87106) (#87384)- 05cb998e11d109a55ddee0fb72c17aa07fa8104a [MPS] Do not dispatch empty job in `bitwise_not` (#87286) (#87427)- 385022c76e4229a9393d63871225e0961e91c2db [ci] handle libomp upgrade on github (#87382) (#87408)- 3ffddc0b2d5fc963463c98b518b0eb1f88788a74 [functorch] fix AOTAutograd tutorial (#87415) (#87434)+ 55c76baf579cb6593f87d1a23e9a49afeb55f15a [1.13] Fix functorch version docs (#87392)- 59686b4c60289188436ac61576fd389821858fa5 [maskedtensor] fix docs formatting (#87387) (#87406)+ 0c0df0be7497112022804df03aeeb0fcbadc9243 Add `weights_only` option to `torch.load` (#87443)+ d253eb29d86af51ae17b950825ffdb5661b5af7f Avoid calling logging.basicConfig (#86959) (#87455)- d3aecbd9bc58a42366abffa63748d73c872bd927 Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)- 51fa4fae41367a17f49ce648c7cdd6aa72f6e6ac Move PadNd from ATen/native to ATen (#87456)- f6c42ae2c29ba788523149dbcc791bf14530f93d Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303) (#87463)- 6a8be2cb630ce79d654a707e5ea454d013acbda1 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) (#87457)+ 8569a44f38ff906103f40c6925dec366c7557943 [MPS] Revamp copy_to_mps_ implementation (#87475)- fdb18da4cc3ee04c2423eb33b2780edc4cae64e0 Fix distributed issue by including distributed files (#87612)- 341c377c0f595ff4e4a0fbdef21978c038b64b98 Add General Project Policies (#87385) (#87613)- 4e1a4b150a4ccd5f401a4293a0336a9a8a1dad2d fix docs push (#87498) (#87628)- 7c98e70d44abc7a1aead68b6ea6c8adc8c554db5 attempted fix for nvrtc with lovelace (#87611) (#87618)- 74a9ca993bd79f8131829e9c946657fa9a1d05ef [JIT][Security] Do not blindly eval input string (#89189) (#89925)+ ae2fe4033cf3b17259b17f351020b988fa893f91 Update masked.rst (#89758) (#89923)- c13d400bffe90e16b96520bbc8a41a6f0c9cd584 Use the Python frame safely in _pythonCallstack' 2022-12-01T10:19:47.3018221Z + PR_BODY='Link to landed master PR (if applicable):https://github.com/pytorch/pytorch/pull/88993Criteria category:1: This prevents a crash, which was introduced [here](https://github.com/pytorch/pytorch/commit/4b7de265569f7fd731dd1cfea83ce804cc22f7c0#diff-45b117ca26b4fc9174fbe0d7a9cd8cb1c43964cd5e2bb20c7778ee00a942ef63), tagged for 1.132: Prevents a crashI'\''m hoping this is a low-risk change, since it'\''s just changing one method for its safer form.' 2022-12-01T10:19:47.3034216Z + export 'COMMIT_MESSAGES=+ 56508e29e6b1847e050462d818021a0389d0de99 Release 1.13, Install torch from test channel, Pin build… (#86290)- f682048bf9475d324044e955d8d67464ee446c3a Fix for the binary upload (#86385)+ 95112ca043c37e38101e629c1f69663e911cc909 Fix binary builds for the release - unblock release (#86484)- c38dbd0e1dd88cce783c8dce6cce1a97276b6bb9 Conditionally build the TestApp benchmark based on lite interpreter (#86314) (#86377)- 50a9cd95baf29e136fdf5f027c6f4c4a7ccd5b9b Add version selector back to functorch docs (#86602) (#86689)- 5be45fc4f1a86624fe5ce0e99a2a5558011ca69e [ROCm] set nvfuser default to disabled, keep CI (#86369) (#86725)- f37023b03f3e92dadb247e89fd4e024eb4a0eb8a [MPS] Better error message for `slow_conv2d_forward` (#86844)- 786431cd13d171965feeb17240cc7bf8dc509dba [DOC] Use type hints to show annotation in the docs (#86851)- 03992a6fb3dbd371fe41d3cd95c6589b45a10f14 Make the data types of output and input consistenst for batchnorm (#86784)+ d9ddab5efa2224704e4b982e651589b31b1c1547 [1.13] Remove torch.vmap (#86333)+ 01042487e2d848d17351555ea73c61d38a60a7f9 [1.13] Release-only improvements to functorch docs (#86693)- 71251e2521ac4b737af43d536dff359df256eadd ci: Just use regular checkout (#86824) (#86895)+ f65ac2ad20f8eab95a0d0fb93c3550afc2b86d91 [CI] Fix builder ref for release, linux only (#86904)- 51c16f092ca8d7802decfd56d34ae99d1c0f335c [functorch] Add more details to the functorch install page (#86823) (#86903)- 311b47b72c73509a82ed0b613a7c563e6628332b [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) (#86726)+ 7e196948ca0c5ede7fa6e0831fda18135e6573a5 Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) (#86902)- aab841e89d850c9da9048a64541cfa78b222f1ff [ONNX] Support device().type() string comparison with constant (#86168) (#86921)- de3fa48e38e5370d18748e131bd20db53390ea5c Install c10d headers with absolute path (#86257) (#86933)+ 366c59e2606368a6f2ee7165d58686d0d494dd92 Allow PrivateUse1 backends to not have Storage (#86557) (#86803)+ 492d572086a93a31fc8972d29726453ee1e46717 [functorch] fix cross (#86934)- f89a762a8efcca77722071ec042a02742ec4537d Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516)- c40a04415c59edc45b2ceba682377b255f2091c7 Add error checking to flaky test bot platform parser (#86632) (#87201)+ eda8263f151a17d0d81d33e3f48fb2102daf4c14 [Release 1.13.0][DataPipe] Documentation and interface fix (#87140)+ d2325f3b6a17e54b5d48c68d82b74354dfd35f61 handle libomp update on circleci (#86979) (#87132)- 7688e5eeabce25b67aa99280772d094b1c5f7091 [ONNX] Fix triu/tril export with diagonal input (#86843) (#86925)- 1ebff1dcaf75d0305dc713eea104ddfa3ae6b588 [ONNX] Renable assert diagnostic test (#85999) (#86924)- 1442f04f3bab3b0e0f22fbb68c8f328e17801c69 Enables lazy loading of cuda modules if not set by user (#86509)+ d324341922beb064057e12daae19199480885d5e Bug fix for torch._C._willEngineExecuteNode (#86672)- 894bad72c9dfb946e9bdfebaf2db41473b3dd4fe [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) (#86923)- ef6dd0d34d379604eec8db34dd58581b8b061249 [ONNX] Ignore print(Tensor) during tracing (#86223) (#87122)- e6d8c1392c817636c15ad974dd160292cdc471bc [einsum] Fix opt_einsum defaults to be more reasonable (#86985) (#87144)+ 3af7c7fc98af441ca90ef38b262497b530a317fb [einsum] fix MPS regression and fix incorrect contraction order when path is None (#87261)- 9e3df4920f75fa2aaf5bbe04ed302fc80008a8b9 Improve NestedTensor documentation (#85186) (#87337)- 3ca59f4b8aff32fbad3fcd13b812b5335047b8ce Reenable aot tests on windows for cuda 11.7 and up (#87193) (#87307)- 0948dbb3e7d6e5e99305b71423f7e9a45a6e5e14 Add torch.sparse overview section (#85265) (#87376)- 1f80ac7fa72cacafbb4901109b465db05f91d818 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) (#87379)- 7342a3b0eac2e3ebf4d9b5ce1c3c73c4134c8751 Add prototype warning to MaskedTensor constructor (#87107) (#87380)- aba5544affda233f8413b5fd74df1cdf7ed91c18 [maskedtensor] add docs (#84887) (#87381)- 6e2cb22d74d1939e0aa5c00c665a58e8796bef31 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) (#87386)+ ca0ee922bc9440bfc86b332d69ee0559f748bc6e [release] Add warning to stateless.functional_call for deprecated behavior (#87079)- d2ce4310bad94f2e30fd75cb14e175bcef7ba6fa Assert if padding mask type is unexpected (#87106) (#87384)- 05cb998e11d109a55ddee0fb72c17aa07fa8104a [MPS] Do not dispatch empty job in `bitwise_not` (#87286) (#87427)- 385022c76e4229a9393d63871225e0961e91c2db [ci] handle libomp upgrade on github (#87382) (#87408)- 3ffddc0b2d5fc963463c98b518b0eb1f88788a74 [functorch] fix AOTAutograd tutorial (#87415) (#87434)+ 55c76baf579cb6593f87d1a23e9a49afeb55f15a [1.13] Fix functorch version docs (#87392)- 59686b4c60289188436ac61576fd389821858fa5 [maskedtensor] fix docs formatting (#87387) (#87406)+ 0c0df0be7497112022804df03aeeb0fcbadc9243 Add `weights_only` option to `torch.load` (#87443)+ d253eb29d86af51ae17b950825ffdb5661b5af7f Avoid calling logging.basicConfig (#86959) (#87455)- d3aecbd9bc58a42366abffa63748d73c872bd927 Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)- 51fa4fae41367a17f49ce648c7cdd6aa72f6e6ac Move PadNd from ATen/native to ATen (#87456)- f6c42ae2c29ba788523149dbcc791bf14530f93d Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303) (#87463)- 6a8be2cb630ce79d654a707e5ea454d013acbda1 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) (#87457)+ 8569a44f38ff906103f40c6925dec366c7557943 [MPS] Revamp copy_to_mps_ implementation (#87475)- fdb18da4cc3ee04c2423eb33b2780edc4cae64e0 Fix distributed issue by including distributed files (#87612)- 341c377c0f595ff4e4a0fbdef21978c038b64b98 Add General Project Policies (#87385) (#87613)- 4e1a4b150a4ccd5f401a4293a0336a9a8a1dad2d fix docs push (#87498) (#87628)- 7c98e70d44abc7a1aead68b6ea6c8adc8c554db5 attempted fix for nvrtc with lovelace (#87611) (#87618)- 74a9ca993bd79f8131829e9c946657fa9a1d05ef [JIT][Security] Do not blindly eval input string (#89189) (#89925)+ ae2fe4033cf3b17259b17f351020b988fa893f91 Update masked.rst (#89758) (#89923)- c13d400bffe90e16b96520bbc8a41a6f0c9cd584 Use the Python frame safely in _pythonCallstack' 2022-12-01T10:19:47.3057852Z + COMMIT_MESSAGES='+ 56508e29e6b1847e050462d818021a0389d0de99 Release 1.13, Install torch from test channel, Pin build… (#86290)- f682048bf9475d324044e955d8d67464ee446c3a Fix for the binary upload (#86385)+ 95112ca043c37e38101e629c1f69663e911cc909 Fix binary builds for the release - unblock release (#86484)- c38dbd0e1dd88cce783c8dce6cce1a97276b6bb9 Conditionally build the TestApp benchmark based on lite interpreter (#86314) (#86377)- 50a9cd95baf29e136fdf5f027c6f4c4a7ccd5b9b Add version selector back to functorch docs (#86602) (#86689)- 5be45fc4f1a86624fe5ce0e99a2a5558011ca69e [ROCm] set nvfuser default to disabled, keep CI (#86369) (#86725)- f37023b03f3e92dadb247e89fd4e024eb4a0eb8a [MPS] Better error message for `slow_conv2d_forward` (#86844)- 786431cd13d171965feeb17240cc7bf8dc509dba [DOC] Use type hints to show annotation in the docs (#86851)- 03992a6fb3dbd371fe41d3cd95c6589b45a10f14 Make the data types of output and input consistenst for batchnorm (#86784)+ d9ddab5efa2224704e4b982e651589b31b1c1547 [1.13] Remove torch.vmap (#86333)+ 01042487e2d848d17351555ea73c61d38a60a7f9 [1.13] Release-only improvements to functorch docs (#86693)- 71251e2521ac4b737af43d536dff359df256eadd ci: Just use regular checkout (#86824) (#86895)+ f65ac2ad20f8eab95a0d0fb93c3550afc2b86d91 [CI] Fix builder ref for release, linux only (#86904)- 51c16f092ca8d7802decfd56d34ae99d1c0f335c [functorch] Add more details to the functorch install page (#86823) (#86903)- 311b47b72c73509a82ed0b613a7c563e6628332b [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) (#86726)+ 7e196948ca0c5ede7fa6e0831fda18135e6573a5 Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) (#86902)- aab841e89d850c9da9048a64541cfa78b222f1ff [ONNX] Support device().type() string comparison with constant (#86168) (#86921)- de3fa48e38e5370d18748e131bd20db53390ea5c Install c10d headers with absolute path (#86257) (#86933)+ 366c59e2606368a6f2ee7165d58686d0d494dd92 Allow PrivateUse1 backends to not have Storage (#86557) (#86803)+ 492d572086a93a31fc8972d29726453ee1e46717 [functorch] fix cross (#86934)- f89a762a8efcca77722071ec042a02742ec4537d Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516)- c40a04415c59edc45b2ceba682377b255f2091c7 Add error checking to flaky test bot platform parser (#86632) (#87201)+ eda8263f151a17d0d81d33e3f48fb2102daf4c14 [Release 1.13.0][DataPipe] Documentation and interface fix (#87140)+ d2325f3b6a17e54b5d48c68d82b74354dfd35f61 handle libomp update on circleci (#86979) (#87132)- 7688e5eeabce25b67aa99280772d094b1c5f7091 [ONNX] Fix triu/tril export with diagonal input (#86843) (#86925)- 1ebff1dcaf75d0305dc713eea104ddfa3ae6b588 [ONNX] Renable assert diagnostic test (#85999) (#86924)- 1442f04f3bab3b0e0f22fbb68c8f328e17801c69 Enables lazy loading of cuda modules if not set by user (#86509)+ d324341922beb064057e12daae19199480885d5e Bug fix for torch._C._willEngineExecuteNode (#86672)- 894bad72c9dfb946e9bdfebaf2db41473b3dd4fe [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) (#86923)- ef6dd0d34d379604eec8db34dd58581b8b061249 [ONNX] Ignore print(Tensor) during tracing (#86223) (#87122)- e6d8c1392c817636c15ad974dd160292cdc471bc [einsum] Fix opt_einsum defaults to be more reasonable (#86985) (#87144)+ 3af7c7fc98af441ca90ef38b262497b530a317fb [einsum] fix MPS regression and fix incorrect contraction order when path is None (#87261)- 9e3df4920f75fa2aaf5bbe04ed302fc80008a8b9 Improve NestedTensor documentation (#85186) (#87337)- 3ca59f4b8aff32fbad3fcd13b812b5335047b8ce Reenable aot tests on windows for cuda 11.7 and up (#87193) (#87307)- 0948dbb3e7d6e5e99305b71423f7e9a45a6e5e14 Add torch.sparse overview section (#85265) (#87376)- 1f80ac7fa72cacafbb4901109b465db05f91d818 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) (#87379)- 7342a3b0eac2e3ebf4d9b5ce1c3c73c4134c8751 Add prototype warning to MaskedTensor constructor (#87107) (#87380)- aba5544affda233f8413b5fd74df1cdf7ed91c18 [maskedtensor] add docs (#84887) (#87381)- 6e2cb22d74d1939e0aa5c00c665a58e8796bef31 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) (#87386)+ ca0ee922bc9440bfc86b332d69ee0559f748bc6e [release] Add warning to stateless.functional_call for deprecated behavior (#87079)- d2ce4310bad94f2e30fd75cb14e175bcef7ba6fa Assert if padding mask type is unexpected (#87106) (#87384)- 05cb998e11d109a55ddee0fb72c17aa07fa8104a [MPS] Do not dispatch empty job in `bitwise_not` (#87286) (#87427)- 385022c76e4229a9393d63871225e0961e91c2db [ci] handle libomp upgrade on github (#87382) (#87408)- 3ffddc0b2d5fc963463c98b518b0eb1f88788a74 [functorch] fix AOTAutograd tutorial (#87415) (#87434)+ 55c76baf579cb6593f87d1a23e9a49afeb55f15a [1.13] Fix functorch version docs (#87392)- 59686b4c60289188436ac61576fd389821858fa5 [maskedtensor] fix docs formatting (#87387) (#87406)+ 0c0df0be7497112022804df03aeeb0fcbadc9243 Add `weights_only` option to `torch.load` (#87443)+ d253eb29d86af51ae17b950825ffdb5661b5af7f Avoid calling logging.basicConfig (#86959) (#87455)- d3aecbd9bc58a42366abffa63748d73c872bd927 Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)- 51fa4fae41367a17f49ce648c7cdd6aa72f6e6ac Move PadNd from ATen/native to ATen (#87456)- f6c42ae2c29ba788523149dbcc791bf14530f93d Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303) (#87463)- 6a8be2cb630ce79d654a707e5ea454d013acbda1 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) (#87457)+ 8569a44f38ff906103f40c6925dec366c7557943 [MPS] Revamp copy_to_mps_ implementation (#87475)- fdb18da4cc3ee04c2423eb33b2780edc4cae64e0 Fix distributed issue by including distributed files (#87612)- 341c377c0f595ff4e4a0fbdef21978c038b64b98 Add General Project Policies (#87385) (#87613)- 4e1a4b150a4ccd5f401a4293a0336a9a8a1dad2d fix docs push (#87498) (#87628)- 7c98e70d44abc7a1aead68b6ea6c8adc8c554db5 attempted fix for nvrtc with lovelace (#87611) (#87618)- 74a9ca993bd79f8131829e9c946657fa9a1d05ef [JIT][Security] Do not blindly eval input string (#89189) (#89925)+ ae2fe4033cf3b17259b17f351020b988fa893f91 Update masked.rst (#89758) (#89923)- c13d400bffe90e16b96520bbc8a41a6f0c9cd584 Use the Python frame safely in _pythonCallstack' 2022-12-01T10:19:47.3066725Z + export 'PR_BODY=Link to landed master PR (if applicable):https://github.com/pytorch/pytorch/pull/88993Criteria category:1: This prevents a crash, which was introduced [here](https://github.com/pytorch/pytorch/commit/4b7de265569f7fd731dd1cfea83ce804cc22f7c0#diff-45b117ca26b4fc9174fbe0d7a9cd8cb1c43964cd5e2bb20c7778ee00a942ef63), tagged for 1.132: Prevents a crashIm hoping this is a low-risk change, since its just changing one method for its safer form.' 2022-12-01T10:19:47.3068713Z + PR_BODY='Link to landed master PR (if applicable):https://github.com/pytorch/pytorch/pull/88993Criteria category:1: This prevents a crash, which was introduced [here](https://github.com/pytorch/pytorch/commit/4b7de265569f7fd731dd1cfea83ce804cc22f7c0#diff-45b117ca26b4fc9174fbe0d7a9cd8cb1c43964cd5e2bb20c7778ee00a942ef63), tagged for 1.132: Prevents a crashIm hoping this is a low-risk change, since its just changing one method for its safer form.' 2022-12-01T10:19:47.3069663Z +++ nproc --ignore=2 2022-12-01T10:19:47.3071811Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e PYTORCH_RETRY_TEST_CASES -e PYTORCH_OVERRIDE_FLAKY_SIGNAL -e PR_LABELS -e MAX_JOBS=30 -e SCCACHE_BUCKET -e SCCACHE_S3_KEY_PREFIX -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME --env-file=/tmp/github_env_3591403534 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T10:20:01.0068125Z + container_name=66307f3ad701b5e8a32dadc4f1fd99633efca32cc1fa29cf0780fcc5b09884b4 2022-12-01T10:20:01.0072415Z ++ echo dist/torch-1.13.0a0+gitc13d400-cp310-cp310-linux_x86_64.whl 2022-12-01T10:20:01.0074375Z + docker exec -t 66307f3ad701b5e8a32dadc4f1fd99633efca32cc1fa29cf0780fcc5b09884b4 sh -c 'pip install dist/torch-1.13.0a0+gitc13d400-cp310-cp310-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh' 2022-12-01T10:20:01.6378671Z Processing ./dist/torch-1.13.0a0+gitc13d400-cp310-cp310-linux_x86_64.whl 2022-12-01T10:20:01.7348289Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch==1.13.0a0+gitc13d400) (4.3.0) 2022-12-01T10:20:01.7361537Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/lib/python3.10/site-packages (from torch==1.13.0a0+gitc13d400) (3.3.0) 2022-12-01T10:20:01.7445689Z Requirement already satisfied: numpy>=1.7 in /opt/conda/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==1.13.0a0+gitc13d400) (1.21.2) 2022-12-01T10:20:02.6997597Z Installing collected packages: torch 2022-12-01T10:20:13.5859898Z Successfully installed torch-1.13.0a0+gitc13d400 2022-12-01T10:20:13.7593323Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2022-12-01T10:20:13.7818105Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch 2022-12-01T10:20:13.7818589Z + TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin 2022-12-01T10:20:13.7821370Z + TORCH_LIB_DIR=/opt/conda/lib/python3.10/site-packages/torch/lib 2022-12-01T10:20:13.7821892Z + TORCH_TEST_DIR=/opt/conda/lib/python3.10/site-packages/torch/test 2022-12-01T10:20:13.7824567Z + BUILD_DIR=build 2022-12-01T10:20:13.7825119Z + BUILD_RENAMED_DIR=build_renamed 2022-12-01T10:20:13.7825546Z + BUILD_BIN_DIR=build/bin 2022-12-01T10:20:13.7825817Z + export VALGRIND=ON 2022-12-01T10:20:13.7826065Z + VALGRIND=ON 2022-12-01T10:20:13.7829706Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *clang9* ]] 2022-12-01T10:20:13.7830174Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *bazel* ]] 2022-12-01T10:20:13.7830503Z ++ realpath build/custom_test_artifacts 2022-12-01T10:20:13.7832664Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2022-12-01T10:20:13.7835656Z ++ dirname .jenkins/pytorch/test.sh 2022-12-01T10:20:13.7843483Z + source .jenkins/pytorch/common.sh 2022-12-01T10:20:13.7847735Z +++ dirname .jenkins/pytorch/common.sh 2022-12-01T10:20:13.7858762Z ++ source .jenkins/pytorch/common_utils.sh 2022-12-01T10:20:13.7861120Z +++ declare -f -t trap_add 2022-12-01T10:20:13.7865464Z ++ set -ex 2022-12-01T10:20:13.7865862Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7 == *rocm* ]] 2022-12-01T10:20:13.7866179Z ++ BUILD_TEST_LIBTORCH=0 2022-12-01T10:20:13.7867746Z ++ [[ distributed == *xla* ]] 2022-12-01T10:20:13.7868146Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7 == *centos* ]] 2022-12-01T10:20:13.7868601Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7 == *linux-bionic* ]] 2022-12-01T10:20:13.7868919Z ++ which conda 2022-12-01T10:20:13.7878050Z /opt/conda/bin/conda 2022-12-01T10:20:13.7878861Z ++ conda install -q -y cmake 2022-12-01T10:20:17.3763534Z Collecting package metadata (current_repodata.json): ...working... done 2022-12-01T10:20:18.6439133Z Solving environment: ...working... done 2022-12-01T10:20:18.7487772Z 2022-12-01T10:20:18.7488154Z ## Package Plan ## 2022-12-01T10:20:18.7488508Z 2022-12-01T10:20:18.7488717Z environment location: /opt/conda 2022-12-01T10:20:18.7488929Z 2022-12-01T10:20:18.7489070Z added / updated specs: 2022-12-01T10:20:18.7489474Z - cmake 2022-12-01T10:20:18.7489705Z 2022-12-01T10:20:18.7489731Z 2022-12-01T10:20:18.7490012Z The following packages will be downloaded: 2022-12-01T10:20:18.7490229Z 2022-12-01T10:20:18.7494153Z package | build 2022-12-01T10:20:18.7495063Z ---------------------------|----------------- 2022-12-01T10:20:18.7495928Z c-ares-1.18.1 | h7f8727e_0 114 KB 2022-12-01T10:20:18.7496644Z ca-certificates-2022.10.11 | h06a4308_0 124 KB 2022-12-01T10:20:18.7497059Z certifi-2022.9.24 | py310h06a4308_0 154 KB 2022-12-01T10:20:18.7497456Z cmake-3.22.1 | h1fce559_0 7.3 MB 2022-12-01T10:20:18.7497856Z expat-2.4.9 | h6a678d5_0 156 KB 2022-12-01T10:20:18.7498210Z krb5-1.19.2 | hac12032_0 1.2 MB 2022-12-01T10:20:18.7498773Z libcurl-7.84.0 | h91b91d3_0 337 KB 2022-12-01T10:20:18.7499198Z libedit-3.1.20210910 | h7f8727e_0 166 KB 2022-12-01T10:20:18.7499621Z libev-4.33 | h7f8727e_1 111 KB 2022-12-01T10:20:18.7499989Z libgcc-ng-11.2.0 | h1234567_1 5.3 MB 2022-12-01T10:20:18.7500374Z libgomp-11.2.0 | h1234567_1 474 KB 2022-12-01T10:20:18.7500765Z libnghttp2-1.46.0 | hce63b2e_0 680 KB 2022-12-01T10:20:18.7501158Z libssh2-1.10.0 | h8f2d780_0 274 KB 2022-12-01T10:20:18.7501531Z libstdcxx-ng-11.2.0 | h1234567_1 4.7 MB 2022-12-01T10:20:18.7501917Z libuv-1.40.0 | h7b6447c_0 736 KB 2022-12-01T10:20:18.7502294Z lz4-c-1.9.3 | h295c915_1 185 KB 2022-12-01T10:20:18.7502656Z openssl-1.1.1s | h7f8727e_0 3.6 MB 2022-12-01T10:20:18.7503037Z rhash-1.4.1 | h3c74f83_1 203 KB 2022-12-01T10:20:18.7503414Z zstd-1.5.2 | ha4553b6_0 488 KB 2022-12-01T10:20:18.7503795Z ------------------------------------------------------------ 2022-12-01T10:20:18.7504125Z Total: 26.3 MB 2022-12-01T10:20:18.7504299Z 2022-12-01T10:20:18.7504461Z The following NEW packages will be INSTALLED: 2022-12-01T10:20:18.7504665Z 2022-12-01T10:20:18.7505030Z c-ares pkgs/main/linux-64::c-ares-1.18.1-h7f8727e_0 None 2022-12-01T10:20:18.7505513Z cmake pkgs/main/linux-64::cmake-3.22.1-h1fce559_0 None 2022-12-01T10:20:18.7506000Z expat pkgs/main/linux-64::expat-2.4.9-h6a678d5_0 None 2022-12-01T10:20:18.7506476Z krb5 pkgs/main/linux-64::krb5-1.19.2-hac12032_0 None 2022-12-01T10:20:18.7507100Z libcurl pkgs/main/linux-64::libcurl-7.84.0-h91b91d3_0 None 2022-12-01T10:20:18.7507601Z libedit pkgs/main/linux-64::libedit-3.1.20210910-h7f8727e_0 None 2022-12-01T10:20:18.7508088Z libev pkgs/main/linux-64::libev-4.33-h7f8727e_1 None 2022-12-01T10:20:18.7508810Z libnghttp2 pkgs/main/linux-64::libnghttp2-1.46.0-hce63b2e_0 None 2022-12-01T10:20:18.7509305Z libssh2 pkgs/main/linux-64::libssh2-1.10.0-h8f2d780_0 None 2022-12-01T10:20:18.7509791Z libuv pkgs/main/linux-64::libuv-1.40.0-h7b6447c_0 None 2022-12-01T10:20:18.7510263Z lz4-c pkgs/main/linux-64::lz4-c-1.9.3-h295c915_1 None 2022-12-01T10:20:18.7510732Z rhash pkgs/main/linux-64::rhash-1.4.1-h3c74f83_1 None 2022-12-01T10:20:18.7511186Z zstd pkgs/main/linux-64::zstd-1.5.2-ha4553b6_0 None 2022-12-01T10:20:18.7511395Z 2022-12-01T10:20:18.7511540Z The following packages will be UPDATED: 2022-12-01T10:20:18.7511739Z 2022-12-01T10:20:18.7512015Z ca-certificates 2022.07.19-h06a4308_0 --> 2022.10.11-h06a4308_0 None 2022-12-01T10:20:18.7512501Z certifi 2022.9.14-py310h06a4308_0 --> 2022.9.24-py310h06a4308_0 None 2022-12-01T10:20:18.7512945Z libgcc-ng 9.3.0-h5101ec6_17 --> 11.2.0-h1234567_1 None 2022-12-01T10:20:18.7513385Z libgomp 9.3.0-h5101ec6_17 --> 11.2.0-h1234567_1 None 2022-12-01T10:20:18.7513832Z libstdcxx-ng 9.3.0-hd4cf53a_17 --> 11.2.0-h1234567_1 None 2022-12-01T10:20:18.7514263Z openssl 1.1.1q-h7f8727e_0 --> 1.1.1s-h7f8727e_0 None 2022-12-01T10:20:18.7514472Z 2022-12-01T10:20:18.7514491Z 2022-12-01T10:20:20.3035498Z Preparing transaction: ...working... done 2022-12-01T10:20:20.9217500Z Verifying transaction: ...working... done 2022-12-01T10:20:21.9192123Z Executing transaction: ...working... done 2022-12-01T10:20:22.0549636Z Retrieving notices: ...working... done 2022-12-01T10:20:22.2627189Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7 == *centos* ]] 2022-12-01T10:20:22.2627837Z + echo 'Environment variables' 2022-12-01T10:20:22.2628331Z Environment variables 2022-12-01T10:20:22.2628581Z + env 2022-12-01T10:20:22.2636263Z SHARD_NUMBER=3 2022-12-01T10:20:22.2636652Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1 2022-12-01T10:20:22.2637138Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6 2022-12-01T10:20:22.2637852Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 2022-12-01T10:20:22.2638674Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6 2022-12-01T10:20:22.2639223Z UCC_HOME=/usr 2022-12-01T10:20:22.2639978Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T10:20:22.2640810Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1 2022-12-01T10:20:22.2641388Z INSTALLED_DB=yes 2022-12-01T10:20:22.2641843Z HOSTNAME=66307f3ad701 2022-12-01T10:20:22.2642305Z GITHUB_REF_NAME=89997/merge 2022-12-01T10:20:22.2643366Z GITHUB_API_URL=https://api.github.com 2022-12-01T10:20:22.2644055Z OPENSSL_DIR=/opt/openssl 2022-12-01T10:20:22.2644625Z UCC_COMMIT=12944da33f911daf505d9bbc51411233d0ed85e1 2022-12-01T10:20:22.2645852Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_2385ff8c-424a-4358-abd6-a3cab2535f9d 2022-12-01T10:20:22.2646663Z CUDA_PATH=/usr/local/cuda 2022-12-01T10:20:22.2647646Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2022-12-01T10:20:22.2648363Z GITHUB_RUN_ATTEMPT=1 2022-12-01T10:20:22.2648860Z TEST_CONFIG=distributed 2022-12-01T10:20:22.2649454Z NV_LIBNPP_VERSION=11.6.3.124-1 2022-12-01T10:20:22.2650153Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1 2022-12-01T10:20:22.2650838Z GITHUB_REPOSITORY_OWNER=pytorch 2022-12-01T10:20:22.2651298Z GITHUB_ACTIONS=true 2022-12-01T10:20:22.2651772Z NVIDIA_VISIBLE_DEVICES=all 2022-12-01T10:20:22.2652360Z NV_NVPROF_VERSION=11.6.124-1 2022-12-01T10:20:22.2653221Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1 2022-12-01T10:20:22.2653696Z CI=true 2022-12-01T10:20:22.2654060Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1 2022-12-01T10:20:22.2654494Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1 2022-12-01T10:20:22.2654789Z BRANCH=pull/89997 2022-12-01T10:20:22.2655117Z GITHUB_HEAD_REF=release/1.13-callstack 2022-12-01T10:20:22.2655459Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab 2022-12-01T10:20:22.2655796Z GITHUB_ACTOR=charlie-wt 2022-12-01T10:20:22.2656105Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache 2022-12-01T10:20:22.2656400Z GITHUB_ACTION_REF= 2022-12-01T10:20:22.2656658Z NCCL_VERSION=2.12.10-1 2022-12-01T10:20:22.2656916Z GITHUB_ACTION=__self 2022-12-01T10:20:22.2657161Z VALGRIND=ON 2022-12-01T10:20:22.2657398Z GITHUB_REF_PROTECTED=false 2022-12-01T10:20:22.2657846Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2022-12-01T10:20:22.2658530Z *** 2022-12-01T10:20:22.2658770Z INSTALLED_VISION=yes 2022-12-01T10:20:22.2659006Z NVARCH=x86_64 2022-12-01T10:20:22.2659313Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1 2022-12-01T10:20:22.2659597Z HOME=/var/lib/jenkins 2022-12-01T10:20:22.2660116Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_2385ff8c-424a-4358-abd6-a3cab2535f9d 2022-12-01T10:20:22.2660533Z CARGO_NET_GIT_FETCH_WITH_CLI=true 2022-12-01T10:20:22.2660817Z GITHUB_ACTION_REPOSITORY= 2022-12-01T10:20:22.2661067Z GITHUB_REF_TYPE=branch 2022-12-01T10:20:22.2661381Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1 2022-12-01T10:20:22.2661672Z GITHUB_RETENTION_DAYS=90 2022-12-01T10:20:22.2662035Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2022-12-01T10:20:22.2662449Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6 2022-12-01T10:20:22.2662999Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_2385ff8c-424a-4358-abd6-a3cab2535f9d 2022-12-01T10:20:22.2663392Z DEBIAN_FRONTEND=noninteractive 2022-12-01T10:20:22.2663747Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2022-12-01T10:20:22.2664058Z GITHUB_REF=refs/pull/89997/merge 2022-12-01T10:20:22.2664364Z NV_CUDA_LIB_VERSION=11.6.2-1 2022-12-01T10:20:22.2664661Z GITHUB_SHA=cdc064133ad5e5a46a756ee9218659e8f252e950 2022-12-01T10:20:22.2665097Z INSTALLED_PROTOBUF=yes 2022-12-01T10:20:22.2665387Z GITHUB_RUN_ID=3591403534 2022-12-01T10:20:22.2665723Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1 2022-12-01T10:20:22.2666032Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2022-12-01T10:20:22.2666334Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2022-12-01T10:20:22.2666628Z NV_NVTX_VERSION=11.6.124-1 2022-12-01T10:20:22.2666928Z GITHUB_SERVER_URL=https://github.com 2022-12-01T10:20:22.2667207Z MAX_JOBS=30 2022-12-01T10:20:22.2667472Z NV_LIBCUBLAS_VERSION=11.9.2.110-1 2022-12-01T10:20:22.2667849Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1 2022-12-01T10:20:22.2668335Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2022-12-01T10:20:22.2668665Z UCX_HOME=/usr 2022-12-01T10:20:22.2668920Z PYTORCH_RETRY_TEST_CASES=1 2022-12-01T10:20:22.2669257Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2022-12-01T10:20:22.2669595Z BASE_SHA=ae2fe4033cf3b17259b17f351020b988fa893f91 2022-12-01T10:20:22.2669939Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1 2022-12-01T10:20:22.2671210Z PR_BODY=Link to landed master PR (if applicable):https://github.com/pytorch/pytorch/pull/88993Criteria category:1: This prevents a crash, which was introduced [here](https://github.com/pytorch/pytorch/commit/4b7de265569f7fd731dd1cfea83ce804cc22f7c0#diff-45b117ca26b4fc9174fbe0d7a9cd8cb1c43964cd5e2bb20c7778ee00a942ef63), tagged for 1.132: Prevents a crashIm hoping this is a low-risk change, since its just changing one method for its safer form. 2022-12-01T10:20:22.2672106Z GITHUB_BASE_REF=release/1.13 2022-12-01T10:20:22.2672365Z TERM=xterm 2022-12-01T10:20:22.2672573Z XLA_CUDA= 2022-12-01T10:20:22.2672848Z NV_NVML_DEV_VERSION=11.6.55-1 2022-12-01T10:20:22.2673126Z TORCH_CUDA_ARCH_LIST=Maxwell 2022-12-01T10:20:22.2673372Z CUDA_VERSION=11.6.2 2022-12-01T10:20:22.2673811Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6 2022-12-01T10:20:22.2674115Z OPENSSL_ROOT_DIR=/opt/openssl 2022-12-01T10:20:22.2674645Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_2385ff8c-424a-4358-abd6-a3cab2535f9d 2022-12-01T10:20:22.2675038Z GITHUB_JOB=test 2022-12-01T10:20:22.2675298Z SCCACHE_S3_KEY_PREFIX=pull 2022-12-01T10:20:22.2691824Z COMMIT_MESSAGES=+ 56508e29e6b1847e050462d818021a0389d0de99 Release 1.13, Install torch from test channel, Pin build… (#86290)- f682048bf9475d324044e955d8d67464ee446c3a Fix for the binary upload (#86385)+ 95112ca043c37e38101e629c1f69663e911cc909 Fix binary builds for the release - unblock release (#86484)- c38dbd0e1dd88cce783c8dce6cce1a97276b6bb9 Conditionally build the TestApp benchmark based on lite interpreter (#86314) (#86377)- 50a9cd95baf29e136fdf5f027c6f4c4a7ccd5b9b Add version selector back to functorch docs (#86602) (#86689)- 5be45fc4f1a86624fe5ce0e99a2a5558011ca69e [ROCm] set nvfuser default to disabled, keep CI (#86369) (#86725)- f37023b03f3e92dadb247e89fd4e024eb4a0eb8a [MPS] Better error message for `slow_conv2d_forward` (#86844)- 786431cd13d171965feeb17240cc7bf8dc509dba [DOC] Use type hints to show annotation in the docs (#86851)- 03992a6fb3dbd371fe41d3cd95c6589b45a10f14 Make the data types of output and input consistenst for batchnorm (#86784)+ d9ddab5efa2224704e4b982e651589b31b1c1547 [1.13] Remove torch.vmap (#86333)+ 01042487e2d848d17351555ea73c61d38a60a7f9 [1.13] Release-only improvements to functorch docs (#86693)- 71251e2521ac4b737af43d536dff359df256eadd ci: Just use regular checkout (#86824) (#86895)+ f65ac2ad20f8eab95a0d0fb93c3550afc2b86d91 [CI] Fix builder ref for release, linux only (#86904)- 51c16f092ca8d7802decfd56d34ae99d1c0f335c [functorch] Add more details to the functorch install page (#86823) (#86903)- 311b47b72c73509a82ed0b613a7c563e6628332b [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) (#86726)+ 7e196948ca0c5ede7fa6e0831fda18135e6573a5 Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) (#86902)- aab841e89d850c9da9048a64541cfa78b222f1ff [ONNX] Support device().type() string comparison with constant (#86168) (#86921)- de3fa48e38e5370d18748e131bd20db53390ea5c Install c10d headers with absolute path (#86257) (#86933)+ 366c59e2606368a6f2ee7165d58686d0d494dd92 Allow PrivateUse1 backends to not have Storage (#86557) (#86803)+ 492d572086a93a31fc8972d29726453ee1e46717 [functorch] fix cross (#86934)- f89a762a8efcca77722071ec042a02742ec4537d Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516)- c40a04415c59edc45b2ceba682377b255f2091c7 Add error checking to flaky test bot platform parser (#86632) (#87201)+ eda8263f151a17d0d81d33e3f48fb2102daf4c14 [Release 1.13.0][DataPipe] Documentation and interface fix (#87140)+ d2325f3b6a17e54b5d48c68d82b74354dfd35f61 handle libomp update on circleci (#86979) (#87132)- 7688e5eeabce25b67aa99280772d094b1c5f7091 [ONNX] Fix triu/tril export with diagonal input (#86843) (#86925)- 1ebff1dcaf75d0305dc713eea104ddfa3ae6b588 [ONNX] Renable assert diagnostic test (#85999) (#86924)- 1442f04f3bab3b0e0f22fbb68c8f328e17801c69 Enables lazy loading of cuda modules if not set by user (#86509)+ d324341922beb064057e12daae19199480885d5e Bug fix for torch._C._willEngineExecuteNode (#86672)- 894bad72c9dfb946e9bdfebaf2db41473b3dd4fe [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) (#86923)- ef6dd0d34d379604eec8db34dd58581b8b061249 [ONNX] Ignore print(Tensor) during tracing (#86223) (#87122)- e6d8c1392c817636c15ad974dd160292cdc471bc [einsum] Fix opt_einsum defaults to be more reasonable (#86985) (#87144)+ 3af7c7fc98af441ca90ef38b262497b530a317fb [einsum] fix MPS regression and fix incorrect contraction order when path is None (#87261)- 9e3df4920f75fa2aaf5bbe04ed302fc80008a8b9 Improve NestedTensor documentation (#85186) (#87337)- 3ca59f4b8aff32fbad3fcd13b812b5335047b8ce Reenable aot tests on windows for cuda 11.7 and up (#87193) (#87307)- 0948dbb3e7d6e5e99305b71423f7e9a45a6e5e14 Add torch.sparse overview section (#85265) (#87376)- 1f80ac7fa72cacafbb4901109b465db05f91d818 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) (#87379)- 7342a3b0eac2e3ebf4d9b5ce1c3c73c4134c8751 Add prototype warning to MaskedTensor constructor (#87107) (#87380)- aba5544affda233f8413b5fd74df1cdf7ed91c18 [maskedtensor] add docs (#84887) (#87381)- 6e2cb22d74d1939e0aa5c00c665a58e8796bef31 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) (#87386)+ ca0ee922bc9440bfc86b332d69ee0559f748bc6e [release] Add warning to stateless.functional_call for deprecated behavior (#87079)- d2ce4310bad94f2e30fd75cb14e175bcef7ba6fa Assert if padding mask type is unexpected (#87106) (#87384)- 05cb998e11d109a55ddee0fb72c17aa07fa8104a [MPS] Do not dispatch empty job in `bitwise_not` (#87286) (#87427)- 385022c76e4229a9393d63871225e0961e91c2db [ci] handle libomp upgrade on github (#87382) (#87408)- 3ffddc0b2d5fc963463c98b518b0eb1f88788a74 [functorch] fix AOTAutograd tutorial (#87415) (#87434)+ 55c76baf579cb6593f87d1a23e9a49afeb55f15a [1.13] Fix functorch version docs (#87392)- 59686b4c60289188436ac61576fd389821858fa5 [maskedtensor] fix docs formatting (#87387) (#87406)+ 0c0df0be7497112022804df03aeeb0fcbadc9243 Add `weights_only` option to `torch.load` (#87443)+ d253eb29d86af51ae17b950825ffdb5661b5af7f Avoid calling logging.basicConfig (#86959) (#87455)- d3aecbd9bc58a42366abffa63748d73c872bd927 Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)- 51fa4fae41367a17f49ce648c7cdd6aa72f6e6ac Move PadNd from ATen/native to ATen (#87456)- f6c42ae2c29ba788523149dbcc791bf14530f93d Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303) (#87463)- 6a8be2cb630ce79d654a707e5ea454d013acbda1 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) (#87457)+ 8569a44f38ff906103f40c6925dec366c7557943 [MPS] Revamp copy_to_mps_ implementation (#87475)- fdb18da4cc3ee04c2423eb33b2780edc4cae64e0 Fix distributed issue by including distributed files (#87612)- 341c377c0f595ff4e4a0fbdef21978c038b64b98 Add General Project Policies (#87385) (#87613)- 4e1a4b150a4ccd5f401a4293a0336a9a8a1dad2d fix docs push (#87498) (#87628)- 7c98e70d44abc7a1aead68b6ea6c8adc8c554db5 attempted fix for nvrtc with lovelace (#87611) (#87618)- 74a9ca993bd79f8131829e9c946657fa9a1d05ef [JIT][Security] Do not blindly eval input string (#89189) (#89925)+ ae2fe4033cf3b17259b17f351020b988fa893f91 Update masked.rst (#89758) (#89923)- c13d400bffe90e16b96520bbc8a41a6f0c9cd584 Use the Python frame safely in _pythonCallstack 2022-12-01T10:20:22.2699410Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2022-12-01T10:20:22.2699706Z NUM_TEST_SHARDS=3 2022-12-01T10:20:22.2699957Z PR_NUMBER=89997 2022-12-01T10:20:22.2700505Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_2385ff8c-424a-4358-abd6-a3cab2535f9d 2022-12-01T10:20:22.2700888Z SHLVL=1 2022-12-01T10:20:22.2701236Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6 2022-12-01T10:20:22.2701570Z GITHUB_REPOSITORY=pytorch/pytorch 2022-12-01T10:20:22.2702164Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 2022-12-01T10:20:22.2702759Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1 2022-12-01T10:20:22.2703075Z SHA1=c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T10:20:22.2703363Z GITHUB_EVENT_NAME=pull_request 2022-12-01T10:20:22.2703711Z NV_CUDA_CUDART_VERSION=11.6.55-1 2022-12-01T10:20:22.2704061Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2022-12-01T10:20:22.2704354Z GITHUB_RUN_NUMBER=69319 2022-12-01T10:20:22.2704598Z GITHUB_WORKFLOW=pull 2022-12-01T10:20:22.2705015Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-12-01T10:20:22.2705471Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1 2022-12-01T10:20:22.2705981Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-12-01T10:20:22.2706383Z GITHUB_TRIGGERING_ACTOR=charlie-wt 2022-12-01T10:20:22.2706652Z _=/usr/bin/env 2022-12-01T10:20:22.2706939Z + echo 'Testing pytorch' 2022-12-01T10:20:22.2707182Z Testing pytorch 2022-12-01T10:20:22.2707453Z + export LANG=C.UTF-8 2022-12-01T10:20:22.2707722Z + LANG=C.UTF-8 2022-12-01T10:20:22.2707946Z + PR_NUMBER=89997 2022-12-01T10:20:22.2708214Z + [[ distributed == \d\e\f\a\u\l\t ]] 2022-12-01T10:20:22.2708513Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2022-12-01T10:20:22.2708898Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *rocm* ]] 2022-12-01T10:20:22.2709224Z + [[ distributed == \s\l\o\w ]] 2022-12-01T10:20:22.2709638Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *slow-gradcheck* ]] 2022-12-01T10:20:22.2710064Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *cuda* ]] 2022-12-01T10:20:22.2710417Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2022-12-01T10:20:22.2710736Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2022-12-01T10:20:22.2711132Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *cuda11* ]] 2022-12-01T10:20:22.2711455Z + export BUILD_SPLIT_CUDA=ON 2022-12-01T10:20:22.2711725Z + BUILD_SPLIT_CUDA=ON 2022-12-01T10:20:22.2711979Z + [[ distributed == *crossref* ]] 2022-12-01T10:20:22.2712262Z + [[ distributed == *dynamo* ]] 2022-12-01T10:20:22.2712552Z + [[ -n 89997 ]] 2022-12-01T10:20:22.2712795Z + [[ -z '' ]] 2022-12-01T10:20:22.2713084Z + export PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1 2022-12-01T10:20:22.2713409Z + PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1 2022-12-01T10:20:22.2713795Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *rocm* ]] 2022-12-01T10:20:22.2714236Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *-bazel-* ]] 2022-12-01T10:20:22.2714599Z + pip_install --user ninja 2022-12-01T10:20:22.2714962Z + pip install --progress-bar off --user ninja 2022-12-01T10:20:22.8444545Z Collecting ninja 2022-12-01T10:20:22.8692133Z Downloading ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB) 2022-12-01T10:20:23.7654261Z Installing collected packages: ninja 2022-12-01T10:20:23.7768557Z  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH. 2022-12-01T10:20:23.7769428Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2022-12-01T10:20:23.7829270Z Successfully installed ninja-1.11.1 2022-12-01T10:20:23.8537250Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-12-01T10:20:23.8537898Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-12-01T10:20:23.8538688Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *asan* ]] 2022-12-01T10:20:23.8539059Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2022-12-01T10:20:23.8539392Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2022-12-01T10:20:23.8545635Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *tbb* ]] 2022-12-01T10:20:23.8560262Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *libtorch* ]] 2022-12-01T10:20:23.8560732Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *-bazel-* ]] 2022-12-01T10:20:23.8563818Z + cd test 2022-12-01T10:20:23.8564197Z + python -c 'import torch; print(torch.__config__.show())' 2022-12-01T10:20:25.1431365Z PyTorch built with: 2022-12-01T10:20:25.1431827Z - GCC 7.5 2022-12-01T10:20:25.1432149Z - C++ Version: 201402 2022-12-01T10:20:25.1432688Z - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2022-12-01T10:20:25.1433249Z - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) 2022-12-01T10:20:25.1433666Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2022-12-01T10:20:25.1434043Z - LAPACK is enabled (usually provided by MKL) 2022-12-01T10:20:25.1434362Z - NNPACK is enabled 2022-12-01T10:20:25.1434999Z - CPU capability usage: AVX2 2022-12-01T10:20:25.1435308Z - CUDA Runtime 11.6 2022-12-01T10:20:25.1435682Z - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52 2022-12-01T10:20:25.1436094Z - CuDNN 8.3.2 (built against CUDA 11.5) 2022-12-01T10:20:25.1436451Z - Magma 2.6.1 2022-12-01T10:20:25.1439471Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 2022-12-01T10:20:25.1441736Z 2022-12-01T10:20:25.3285075Z + cd test 2022-12-01T10:20:25.3285764Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2022-12-01T10:20:26.5478026Z ATen/Parallel: 2022-12-01T10:20:26.5478370Z at::get_num_threads() : 16 2022-12-01T10:20:26.5478676Z at::get_num_interop_threads() : 16 2022-12-01T10:20:26.5478974Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2022-12-01T10:20:26.5479258Z omp_get_max_threads() : 16 2022-12-01T10:20:26.5479901Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2022-12-01T10:20:26.5480291Z mkl_get_max_threads() : 16 2022-12-01T10:20:26.5481033Z Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) 2022-12-01T10:20:26.5481419Z std::thread::hardware_concurrency() : 32 2022-12-01T10:20:26.5481714Z Environment variables: 2022-12-01T10:20:26.5481988Z OMP_NUM_THREADS : [not set] 2022-12-01T10:20:26.5482259Z MKL_NUM_THREADS : [not set] 2022-12-01T10:20:26.5482927Z ATen parallel backend: OpenMP 2022-12-01T10:20:26.5483113Z 2022-12-01T10:20:26.7220002Z + [[ distributed == *backward* ]] 2022-12-01T10:20:26.7220329Z + [[ distributed == *xla* ]] 2022-12-01T10:20:26.7220621Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2022-12-01T10:20:26.7221159Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *libtorch* ]] 2022-12-01T10:20:26.7221507Z + [[ distributed == distributed ]] 2022-12-01T10:20:26.7221767Z + install_torchdynamo 2022-12-01T10:20:26.7222037Z + local commit 2022-12-01T10:20:26.7224878Z ++ get_pinned_commit torchdynamo 2022-12-01T10:20:26.7225197Z ++ cat .github/ci_commit_pins/torchdynamo.txt 2022-12-01T10:20:26.7241219Z + commit=6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:26.7241873Z + pip_install --user git+https://github.com/pytorch/torchdynamo.git@6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:26.7243277Z + pip install --progress-bar off --user git+https://github.com/pytorch/torchdynamo.git@6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:27.2229697Z Collecting git+https://github.com/pytorch/torchdynamo.git@6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:27.2235694Z Cloning https://github.com/pytorch/torchdynamo.git (to revision 6ead5cae0d1234aa64db06fe230ef56e12ec76fe) to /tmp/pip-req-build-pyx9pzto 2022-12-01T10:20:27.2257623Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/torchdynamo.git /tmp/pip-req-build-pyx9pzto 2022-12-01T10:20:27.8635909Z Running command git rev-parse -q --verify 'sha^6ead5cae0d1234aa64db06fe230ef56e12ec76fe' 2022-12-01T10:20:27.8658548Z Running command git fetch -q https://github.com/pytorch/torchdynamo.git 6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:28.1499811Z Running command git checkout -q 6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:28.6758438Z Resolved https://github.com/pytorch/torchdynamo.git to commit 6ead5cae0d1234aa64db06fe230ef56e12ec76fe 2022-12-01T10:20:30.6949368Z Preparing metadata (setup.py) ... [?25l- done 2022-12-01T10:20:30.7018722Z [?25hRequirement already satisfied: torch>=1.12.0 in /opt/conda/lib/python3.10/site-packages (from torchdynamo==1.13.0.dev0) (1.13.0a0+gitc13d400) 2022-12-01T10:20:30.7023259Z Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from torchdynamo==1.13.0.dev0) (1.21.2) 2022-12-01T10:20:30.7471741Z Collecting tabulate 2022-12-01T10:20:30.7664570Z Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) 2022-12-01T10:20:30.7740722Z Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages/PyYAML-6.0-py3.10-linux-x86_64.egg (from torchdynamo==1.13.0.dev0) (6.0) 2022-12-01T10:20:30.7745388Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torchdynamo==1.13.0.dev0) (1.11.1) 2022-12-01T10:20:30.7785013Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch>=1.12.0->torchdynamo==1.13.0.dev0) (4.3.0) 2022-12-01T10:20:30.7820835Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torchdynamo==1.13.0.dev0) (1.2.1) 2022-12-01T10:20:30.7946361Z Building wheels for collected packages: torchdynamo 2022-12-01T10:20:35.0956407Z Building wheel for torchdynamo (setup.py) ... [?25l- \ | / - done 2022-12-01T10:20:35.1058509Z [?25h Created wheel for torchdynamo: filename=torchdynamo-1.13.0.dev0-cp310-cp310-linux_x86_64.whl size=2693820 sha256=c65cf7a900469db649ecab92525819498257a837ad8a8c0f703a147d1762b49b 2022-12-01T10:20:35.1059571Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/6c/b6/c8/fbfe87a581aa24ccebb62be69424794eca3c035823c5b100d9 2022-12-01T10:20:35.1082198Z Successfully built torchdynamo 2022-12-01T10:20:35.9827191Z Installing collected packages: tabulate, torchdynamo 2022-12-01T10:20:38.1682781Z Successfully installed tabulate-0.9.0 torchdynamo-1.13.0.dev0 2022-12-01T10:20:38.2623832Z + test_distributed 2022-12-01T10:20:38.2624329Z + echo 'Testing distributed python tests' 2022-12-01T10:20:38.2624638Z Testing distributed python tests 2022-12-01T10:20:38.2625086Z + python test/run_test.py --distributed-tests --shard 3 3 --verbose 2022-12-01T10:20:40.0193023Z Ignoring disabled issues: [] 2022-12-01T10:20:40.0572963Z /var/lib/jenkins/workspace/test/run_test.py:1050: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 2022-12-01T10:20:40.0573513Z if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6": 2022-12-01T10:20:40.0577559Z Found test time stats from artifacts 2022-12-01T10:20:40.0588478Z Selected tests: 2022-12-01T10:20:40.0588726Z distributed/test_c10d_nccl 2022-12-01T10:20:40.0589038Z distributed/fsdp/test_fsdp_core 2022-12-01T10:20:40.0589343Z distributed/fsdp/test_fsdp_state_dict 2022-12-01T10:20:40.0589660Z distributed/fsdp/test_fsdp_optim_state 2022-12-01T10:20:40.0589990Z distributed/_shard/sharded_tensor/test_sharded_tensor 2022-12-01T10:20:40.0590323Z distributed/test_c10d_pypg 2022-12-01T10:20:40.0590617Z distributed/fsdp/test_fsdp_checkpoint 2022-12-01T10:20:40.0590900Z distributed/fsdp/test_fsdp_misc 2022-12-01T10:20:40.0591185Z distributed/test_pg_wrapper 2022-12-01T10:20:40.0591476Z distributed/fsdp/test_fsdp_grad_acc 2022-12-01T10:20:40.0591781Z distributed/fsdp/test_fsdp_freezing_weights 2022-12-01T10:20:40.0592096Z distributed/test_c10d_spawn_gloo 2022-12-01T10:20:40.0592390Z distributed/fsdp/test_fsdp_exec_order 2022-12-01T10:20:40.0592945Z distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 2022-12-01T10:20:40.0593310Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2022-12-01T10:20:40.0593652Z distributed/fsdp/test_fsdp_clip_grad_norm 2022-12-01T10:20:40.0593982Z distributed/fsdp/test_fsdp_ignored_modules 2022-12-01T10:20:40.0594284Z distributed/test_c10d_object_collectives 2022-12-01T10:20:40.0594588Z distributed/fsdp/test_fsdp_apply 2022-12-01T10:20:40.0594918Z distributed/_shard/sharded_tensor/ops/test_tensor_ops 2022-12-01T10:20:40.0595230Z distributed/_shard/test_partial_tensor 2022-12-01T10:20:40.0595553Z distributed/elastic/timer/local_timer_example 2022-12-01T10:20:40.0595889Z distributed/fsdp/test_distributed_checkpoint 2022-12-01T10:20:40.0596213Z distributed/_shard/sharded_tensor/ops/test_linear 2022-12-01T10:20:40.0596560Z distributed/_shard/sharded_tensor/ops/test_softmax 2022-12-01T10:20:40.0596921Z distributed/_shard/sharded_tensor/ops/test_embedding_bag 2022-12-01T10:20:40.0597278Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 2022-12-01T10:20:40.0597640Z distributed/fsdp/test_fsdp_multiple_forward 2022-12-01T10:20:40.0597953Z distributed/fsdp/test_fsdp_uneven 2022-12-01T10:20:40.0598271Z distributed/elastic/timer/local_timer_test 2022-12-01T10:20:40.0598579Z distributed/elastic/utils/distributed_test 2022-12-01T10:20:40.0598890Z distributed/rpc/test_share_memory 2022-12-01T10:20:40.0599202Z distributed/fsdp/test_checkpoint_wrapper 2022-12-01T10:20:40.0599498Z distributed/elastic/utils/util_test 2022-12-01T10:20:40.0599817Z distributed/nn/jit/test_instantiator 2022-12-01T10:20:40.0600113Z distributed/fsdp/test_fsdp_fx 2022-12-01T10:20:40.0600372Z distributed/test_launcher 2022-12-01T10:20:40.0600679Z distributed/_shard/checkpoint/test_checkpoint 2022-12-01T10:20:40.0601013Z distributed/_shard/checkpoint/test_planner 2022-12-01T10:20:40.0601321Z distributed/_shard/test_replicated_tensor 2022-12-01T10:20:40.0601628Z distributed/elastic/timer/api_test 2022-12-01T10:20:40.0601932Z distributed/fsdp/test_shard_utils 2022-12-01T10:20:40.0602250Z distributed/pipeline/sync/skip/test_inspect_skip_layout 2022-12-01T10:20:40.0602922Z distributed/pipeline/sync/skip/test_stash_pop 2022-12-01T10:20:40.0603366Z distributed/pipeline/sync/test_balance 2022-12-01T10:20:40.0603696Z distributed/pipeline/sync/test_copy 2022-12-01T10:20:40.0603991Z distributed/pipeline/sync/test_inplace 2022-12-01T10:20:40.0604298Z distributed/pipeline/sync/test_pipe 2022-12-01T10:20:40.0604631Z distributed/pipeline/sync/test_transparency 2022-12-01T10:20:40.0604934Z distributed/rpc/test_tensorpipe_agent 2022-12-01T10:20:40.2591940Z Prioritized test from test file changes. 2022-12-01T10:20:40.2592297Z reordering tests for PR: 2022-12-01T10:20:40.2594342Z prioritized: ['distributed/test_c10d_nccl', 'distributed/fsdp/test_fsdp_core', 'distributed/fsdp/test_fsdp_state_dict', 'distributed/fsdp/test_fsdp_optim_state', 'distributed/fsdp/test_fsdp_checkpoint', 'distributed/fsdp/test_fsdp_misc', 'distributed/fsdp/test_fsdp_grad_acc', 'distributed/fsdp/test_fsdp_freezing_weights', 'distributed/fsdp/test_fsdp_exec_order', 'distributed/algorithms/ddp_comm_hooks/test_ddp_hooks', 'distributed/fsdp/test_fsdp_clip_grad_norm', 'distributed/fsdp/test_fsdp_ignored_modules', 'distributed/fsdp/test_fsdp_apply', 'distributed/fsdp/test_distributed_checkpoint', 'distributed/fsdp/test_fsdp_multiple_forward', 'distributed/fsdp/test_fsdp_uneven', 'distributed/fsdp/test_checkpoint_wrapper', 'distributed/fsdp/test_fsdp_fx', 'distributed/_shard/checkpoint/test_checkpoint', 'distributed/_shard/checkpoint/test_planner'] 2022-12-01T10:20:40.2598208Z the rest: ['distributed/_shard/sharded_tensor/test_sharded_tensor', 'distributed/test_c10d_pypg', 'distributed/test_pg_wrapper', 'distributed/test_c10d_spawn_gloo', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops', 'distributed/test_c10d_object_collectives', 'distributed/_shard/sharded_tensor/ops/test_tensor_ops', 'distributed/_shard/test_partial_tensor', 'distributed/elastic/timer/local_timer_example', 'distributed/_shard/sharded_tensor/ops/test_linear', 'distributed/_shard/sharded_tensor/ops/test_softmax', 'distributed/_shard/sharded_tensor/ops/test_embedding_bag', 'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard', 'distributed/elastic/timer/local_timer_test', 'distributed/elastic/utils/distributed_test', 'distributed/rpc/test_share_memory', 'distributed/elastic/utils/util_test', 'distributed/nn/jit/test_instantiator', 'distributed/test_launcher', 'distributed/_shard/test_replicated_tensor', 'distributed/elastic/timer/api_test', 'distributed/fsdp/test_shard_utils', 'distributed/pipeline/sync/skip/test_inspect_skip_layout', 'distributed/pipeline/sync/skip/test_stash_pop', 'distributed/pipeline/sync/test_balance', 'distributed/pipeline/sync/test_copy', 'distributed/pipeline/sync/test_inplace', 'distributed/pipeline/sync/test_pipe', 'distributed/pipeline/sync/test_transparency', 'distributed/rpc/test_tensorpipe_agent'] 2022-12-01T10:20:40.2600391Z 2022-12-01T10:20:40.2600936Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json to /var/lib/jenkins/workspace/test/.pytorch-slow-tests.json 2022-12-01T10:20:40.2833257Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2022-12-01T10:20:40.3009088Z parallel (file granularity) tests: 2022-12-01T10:20:40.3009371Z 2022-12-01T10:20:40.3009618Z serial (file granularity) tests: 2022-12-01T10:20:40.3009898Z distributed/test_c10d_nccl 2022-12-01T10:20:40.3010188Z distributed/fsdp/test_fsdp_core 2022-12-01T10:20:40.3010496Z distributed/fsdp/test_fsdp_state_dict 2022-12-01T10:20:40.3010793Z distributed/fsdp/test_fsdp_optim_state 2022-12-01T10:20:40.3011106Z distributed/fsdp/test_fsdp_checkpoint 2022-12-01T10:20:40.3011406Z distributed/fsdp/test_fsdp_misc 2022-12-01T10:20:40.3011728Z distributed/fsdp/test_fsdp_grad_acc 2022-12-01T10:20:40.3012031Z distributed/fsdp/test_fsdp_freezing_weights 2022-12-01T10:20:40.3012366Z distributed/fsdp/test_fsdp_exec_order 2022-12-01T10:20:40.3012707Z distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 2022-12-01T10:20:40.3013193Z distributed/fsdp/test_fsdp_clip_grad_norm 2022-12-01T10:20:40.3013574Z distributed/fsdp/test_fsdp_ignored_modules 2022-12-01T10:20:40.3013901Z distributed/fsdp/test_fsdp_apply 2022-12-01T10:20:40.3014221Z distributed/fsdp/test_distributed_checkpoint 2022-12-01T10:20:40.3014575Z distributed/fsdp/test_fsdp_multiple_forward 2022-12-01T10:20:40.3014902Z distributed/fsdp/test_fsdp_uneven 2022-12-01T10:20:40.3015232Z distributed/fsdp/test_checkpoint_wrapper 2022-12-01T10:20:40.3015532Z distributed/fsdp/test_fsdp_fx 2022-12-01T10:20:40.3015863Z distributed/_shard/checkpoint/test_checkpoint 2022-12-01T10:20:40.3016215Z distributed/_shard/checkpoint/test_planner 2022-12-01T10:20:40.3016562Z distributed/_shard/sharded_tensor/test_sharded_tensor 2022-12-01T10:20:40.3016893Z distributed/test_c10d_pypg 2022-12-01T10:20:40.3017193Z distributed/test_pg_wrapper 2022-12-01T10:20:40.3017474Z distributed/test_c10d_spawn_gloo 2022-12-01T10:20:40.3017825Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2022-12-01T10:20:40.3018180Z distributed/test_c10d_object_collectives 2022-12-01T10:20:40.3018522Z distributed/_shard/sharded_tensor/ops/test_tensor_ops 2022-12-01T10:20:40.3018869Z distributed/_shard/test_partial_tensor 2022-12-01T10:20:40.3019210Z distributed/elastic/timer/local_timer_example 2022-12-01T10:20:40.3019549Z distributed/_shard/sharded_tensor/ops/test_linear 2022-12-01T10:20:40.3019912Z distributed/_shard/sharded_tensor/ops/test_softmax 2022-12-01T10:20:40.3020291Z distributed/_shard/sharded_tensor/ops/test_embedding_bag 2022-12-01T10:20:40.3020687Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 2022-12-01T10:20:40.3021044Z distributed/elastic/timer/local_timer_test 2022-12-01T10:20:40.3021391Z distributed/elastic/utils/distributed_test 2022-12-01T10:20:40.3021715Z distributed/rpc/test_share_memory 2022-12-01T10:20:40.3022118Z distributed/elastic/utils/util_test 2022-12-01T10:20:40.3022436Z distributed/nn/jit/test_instantiator 2022-12-01T10:20:40.3022733Z distributed/test_launcher 2022-12-01T10:20:40.3023035Z distributed/_shard/test_replicated_tensor 2022-12-01T10:20:40.3023363Z distributed/elastic/timer/api_test 2022-12-01T10:20:40.3023678Z distributed/fsdp/test_shard_utils 2022-12-01T10:20:40.3024016Z distributed/pipeline/sync/skip/test_inspect_skip_layout 2022-12-01T10:20:40.3024396Z distributed/pipeline/sync/skip/test_stash_pop 2022-12-01T10:20:40.3024737Z distributed/pipeline/sync/test_balance 2022-12-01T10:20:40.3025043Z distributed/pipeline/sync/test_copy 2022-12-01T10:20:40.3025372Z distributed/pipeline/sync/test_inplace 2022-12-01T10:20:40.3025702Z distributed/pipeline/sync/test_pipe 2022-12-01T10:20:40.3026025Z distributed/pipeline/sync/test_transparency 2022-12-01T10:20:40.3026368Z distributed/rpc/test_tensorpipe_agent 2022-12-01T10:20:42.1054280Z Ignoring disabled issues: [] 2022-12-01T10:20:42.1128689Z Ignoring disabled issues: [] 2022-12-01T10:20:42.4136801Z Running distributed/test_c10d_nccl ... [2022-12-01 10:20:42.413196] 2022-12-01T10:20:42.4139788Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_nccl.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:20:42.413641] 2022-12-01T10:36:33.3120341Z 2022-12-01T10:36:33.3123357Z Expand the folded group to see the log file of distributed/test_c10d_nccl 2022-12-01T10:36:33.3131386Z ##[group]PRINTING LOG FILE of distributed/test_c10d_nccl (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_nccl_q8lksq9h) 2022-12-01T10:36:33.3134218Z , <__main__.CommTest testMethod=test_broadcast_coalesced_nccl>, <__main__.CommTest testMethod=test_nccl_barrier>, <__main__.CommTest testMethod=test_nccl_barrier_device_ids>, <__main__.CommTest testMethod=test_nccl_barrier_device_ids_function_argument>, <__main__.CommTest testMethod=test_nccl_barrier_timeout>, <__main__.CommTest testMethod=test_nccl_barrier_timeout_new_group>, <__main__.CommTest testMethod=test_nccl_barrier_timeout_new_group_non_member>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_detail>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_info>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_off>, <__main__.CommTest testMethod=test_nncl_rank_membership>, <__main__.CommTest testMethod=test_pass_nccl_options_high_priority_stream>, <__main__.CommTest testMethod=test_sequence_num_incremented_nccl_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_nccl_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_nccl>, <__main__.CommTest testMethod=test_sequence_num_set_nccl_new_group>, <__main__.CommTest testMethod=test_tensor_dtype_complex>, <__main__.CommTest testMethod=test_tensor_dtype_mismatch>]> 2022-12-01T10:36:33.3136250Z test_all_reduce_coalesced_nccl (__main__.CommTest) 2022-12-01T10:36:33.3136574Z test_broadcast_coalesced_nccl (__main__.CommTest) 2022-12-01T10:36:33.3136896Z test_nccl_barrier (__main__.CommTest) 2022-12-01T10:36:33.3137203Z test_nccl_barrier_device_ids (__main__.CommTest) 2022-12-01T10:36:33.3137566Z test_nccl_barrier_device_ids_function_argument (__main__.CommTest) 2022-12-01T10:36:33.3138189Z test_nccl_barrier_timeout (__main__.CommTest) 2022-12-01T10:36:33.3138810Z test_nccl_barrier_timeout_new_group (__main__.CommTest) 2022-12-01T10:36:33.3139510Z test_nccl_barrier_timeout_new_group_non_member (__main__.CommTest) 2022-12-01T10:36:33.3140189Z test_nccl_warn_not_in_group_debug_detail (__main__.CommTest) 2022-12-01T10:36:33.3140856Z test_nccl_warn_not_in_group_debug_info (__main__.CommTest) 2022-12-01T10:36:33.3141502Z test_nccl_warn_not_in_group_debug_off (__main__.CommTest) 2022-12-01T10:36:33.3141851Z test_nncl_rank_membership (__main__.CommTest) 2022-12-01T10:36:33.3142503Z test_pass_nccl_options_high_priority_stream (__main__.CommTest) 2022-12-01T10:36:33.3143352Z test_sequence_num_incremented_nccl_default (__main__.CommTest) 2022-12-01T10:36:33.3144099Z test_sequence_num_incremented_nccl_subgroup (__main__.CommTest) 2022-12-01T10:36:33.3144806Z test_sequence_num_set_default_pg_nccl (__main__.CommTest) 2022-12-01T10:36:33.3145407Z test_sequence_num_set_nccl_new_group (__main__.CommTest) 2022-12-01T10:36:33.3146035Z test_tensor_dtype_complex (__main__.CommTest) 2022-12-01T10:36:33.3146628Z test_tensor_dtype_mismatch (__main__.CommTest) 2022-12-01T10:36:33.3148236Z , <__main__.CompilerTest testMethod=test_allreduce_work_wait_gpu>, <__main__.CompilerTest testMethod=test_broadcast_work_wait_gpu>, <__main__.CompilerTest testMethod=test_consecutive_comm_work_wait_gpu>, <__main__.CompilerTest testMethod=test_nested_comm_tensor_wrapping>, <__main__.CompilerTest testMethod=test_reduce_scatter_work_wait_gpu>, <__main__.CompilerTest testMethod=test_scatter_work_wait_gpu>]> 2022-12-01T10:36:33.3149800Z test_allgather_work_wait_gpu (__main__.CompilerTest) 2022-12-01T10:36:33.3150453Z test_allreduce_work_wait_gpu (__main__.CompilerTest) 2022-12-01T10:36:33.3151095Z test_broadcast_work_wait_gpu (__main__.CompilerTest) 2022-12-01T10:36:33.3151779Z test_consecutive_comm_work_wait_gpu (__main__.CompilerTest) 2022-12-01T10:36:33.3152439Z test_nested_comm_tensor_wrapping (__main__.CompilerTest) 2022-12-01T10:36:33.3152963Z test_reduce_scatter_work_wait_gpu (__main__.CompilerTest) 2022-12-01T10:36:33.3153403Z test_scatter_work_wait_gpu (__main__.CompilerTest) 2022-12-01T10:36:33.3169217Z , <__main__.DistributedDataParallelTest testMethod=test_accumulate_gradients_module_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_arbitrary_forward_return_value>, <__main__.DistributedDataParallelTest testMethod=test_arbitrary_forward_return_value_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_bf16_compress_wrapper_is_view>, <__main__.DistributedDataParallelTest testMethod=test_bf16_compress_wrapper_nccl>, <__main__.DistributedDataParallelTest testMethod=test_builtin_ddp_comm_hooks_nccl>, <__main__.DistributedDataParallelTest testMethod=test_builtin_ddp_comm_hooks_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_channels_last_contig>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_module>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_with_then_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_multi_device_module_config>, <__main__.DistributedDataParallelTest testMethod=test_ddp_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_with_lazy_parameters>, <__main__.DistributedDataParallelTest testMethod=test_default_ddp_comm_hooks_nccl>, <__main__.DistributedDataParallelTest testMethod=test_default_ddp_comm_hooks_nccl_is_view>, <__main__.DistributedDataParallelTest testMethod=test_failure_recovery>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_detail>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_info>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_off>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_detail>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_info>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_off>, <__main__.DistributedDataParallelTest testMethod=test_fp16>, <__main__.DistributedDataParallelTest testMethod=test_fp16_compress_wrapper_is_view>, <__main__.DistributedDataParallelTest testMethod=test_fp16_compress_wrapper_nccl>, <__main__.DistributedDataParallelTest testMethod=test_fp16_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_grad_layout_1devicemodule_1replicaperprocess>, <__main__.DistributedDataParallelTest testMethod=test_grad_layout_2devicemodule>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_multiple_outputs_multiple_backward>, <__main__.DistributedDataParallelTest testMethod=test_multiple_outputs_multiple_backward_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_multi_device_ids_not_allowed>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_multi_device_module_device_ids_None>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_single_device_module_device_ids_None>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_single_device_module_empty_device_ids>, <__main__.DistributedDataParallelTest testMethod=test_nccl_propagate_error_reason>, <__main__.DistributedDataParallelTest testMethod=test_no_grad>, <__main__.DistributedDataParallelTest testMethod=test_param_layout_mismatch_error>, <__main__.DistributedDataParallelTest testMethod=test_pass_default_pg>, <__main__.DistributedDataParallelTest testMethod=test_powerSGD_ddp_comm_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_powerSGD_ddp_comm_hook_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2022-12-01T10:36:33.3185263Z test_accumulate_gradients_module (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3186179Z test_accumulate_gradients_module_with_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3187072Z test_arbitrary_forward_return_value (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3187664Z test_arbitrary_forward_return_value_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3188185Z test_bf16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3188716Z test_bf16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3189169Z test_builtin_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3189606Z test_builtin_ddp_comm_hooks_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3190040Z test_channels_last_contig (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3190490Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3190954Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3191431Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3191916Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3192426Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3192962Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3193486Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3193973Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3194448Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3194933Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3195453Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3195968Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3196473Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3196936Z test_ddp_comm_hook_allreduce_hook_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3197402Z test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3197964Z test_ddp_comm_hook_allreduce_hook_nccl_static_graph (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3198440Z test_ddp_comm_hook_allreduce_with_then_hook_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3198909Z test_ddp_comm_hook_future_passing_gpu_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3199364Z test_ddp_multi_device_module_config (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3199792Z test_ddp_weight_sharing (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3200209Z test_ddp_with_lazy_parameters (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3200636Z test_default_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3201099Z test_default_ddp_comm_hooks_nccl_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3201508Z test_failure_recovery (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3201951Z test_find_unused_parameters_kwarg_debug_detail (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3202889Z test_find_unused_parameters_kwarg_debug_info (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3203731Z test_find_unused_parameters_kwarg_debug_off (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3204708Z test_find_unused_parameters_kwarg_grad_is_view_debug_detail (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3205484Z test_find_unused_parameters_kwarg_grad_is_view_debug_info (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3205991Z test_find_unused_parameters_kwarg_grad_is_view_debug_off (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3206425Z test_fp16 (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3206815Z test_fp16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3207419Z test_fp16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3207831Z test_fp16_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3208282Z test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3208742Z test_grad_layout_2devicemodule (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3209176Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3209608Z test_multiple_outputs_multiple_backward (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3210094Z test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3210581Z test_nccl_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3211081Z test_nccl_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3211539Z test_nccl_backend_2gpu_module (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3211969Z test_nccl_backend_4gpu_module (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3212419Z test_nccl_backend_multi_device_ids_not_allowed (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3212909Z test_nccl_backend_multi_device_module_device_ids_None (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3213380Z test_nccl_backend_single_device_module_device_ids_None (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3213884Z test_nccl_backend_single_device_module_empty_device_ids (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3214343Z test_nccl_propagate_error_reason (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3214741Z test_no_grad (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3215148Z test_param_layout_mismatch_error (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3217046Z test_pass_default_pg (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3217485Z test_powerSGD_ddp_comm_hook_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3217938Z test_powerSGD_ddp_comm_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3218449Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3218923Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3219303Z 2022-12-01T10:36:33.3220567Z , <__main__.NcclErrorHandlingTest testMethod=test_nccl_blocking_wait_with_barrier>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_abort>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_clean_exit>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_nonzero_exit>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_sigkill>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_sigterm>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_nonblocking>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_timeout>]> 2022-12-01T10:36:33.3221788Z test_invalid_nccl_blocking_wait_env (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3222198Z test_nccl_blocking_wait_with_barrier (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3222578Z test_nccl_errors_blocking_abort (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3222980Z test_nccl_errors_blocking_clean_exit (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3223379Z test_nccl_errors_blocking_nonzero_exit (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3223763Z test_nccl_errors_blocking_sigkill (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3224159Z test_nccl_errors_blocking_sigterm (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3224552Z test_nccl_errors_nonblocking (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3224901Z test_nccl_timeout (__main__.NcclErrorHandlingTest) 2022-12-01T10:36:33.3225493Z ]> 2022-12-01T10:36:33.3226087Z test_collectives (__main__.NcclProcessGroupWithDispatchedCollectivesTests) 2022-12-01T10:36:33.3226621Z ]> 2022-12-01T10:36:33.3227058Z test_init_no_gpus (__main__.ProcessGroupNCCLNoGPUTest) 2022-12-01T10:36:33.3228979Z , <__main__.ProcessGroupNCCLTest testMethod=test_allgather_base_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_allgather_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_allreduce_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_barrier>, <__main__.ProcessGroupNCCLTest testMethod=test_broadcast_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_empty_tensors>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_checks>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_stress>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_base_basics>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_base_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_checks>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_stress>, <__main__.ProcessGroupNCCLTest testMethod=test_send_recv>]> 2022-12-01T10:36:33.3230987Z test_allgather_base_basics (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3231366Z test_allgather_base_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3231715Z test_allgather_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3232068Z test_allreduce_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3232419Z test_barrier (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3232749Z test_broadcast_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3233100Z test_empty_tensors (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3233507Z test_gather_checks (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3233856Z test_gather_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3234204Z test_gather_stress (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3234547Z test_reduce_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3234903Z test_reduce_scatter_base_basics (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3235290Z test_reduce_scatter_base_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3235663Z test_reduce_scatter_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3236024Z test_scatter_checks (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3236357Z test_scatter_ops (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3236707Z test_scatter_stress (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3237059Z test_send_recv (__main__.ProcessGroupNCCLTest) 2022-12-01T10:36:33.3237462Z ]> 2022-12-01T10:36:33.3237872Z test_common_errors (__main__.RendezvousEnvTest) 2022-12-01T10:36:33.3238194Z 2022-12-01T10:36:33.3238592Z ]> 2022-12-01T10:36:33.3239012Z test_default_store_timeout_nccl (__main__.TimeoutTest) 2022-12-01T10:36:33.3239698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3240150Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3240710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3241177Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3241484Z 2022-12-01T10:36:33.3241596Z Running tests... 2022-12-01T10:36:33.3241987Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3242931Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3243425Z test_all_reduce_coalesced_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3243888Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 770 2022-12-01T10:36:33.3244313Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 771 2022-12-01T10:36:33.3244925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3245377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3245931Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3246403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3246975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3247416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3247965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3248422Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3248854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3249331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3250318Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1577: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2022-12-01T10:36:33.3250943Z warnings.warn( 2022-12-01T10:36:33.3251917Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1577: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2022-12-01T10:36:33.3252555Z warnings.warn( 2022-12-01T10:36:33.3252774Z ok (6.439s) 2022-12-01T10:36:33.3252923Z 2022-12-01T10:36:33.3253192Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3253518Z Ran 1 test in 6.439s 2022-12-01T10:36:33.3253679Z 2022-12-01T10:36:33.3253771Z OK 2022-12-01T10:36:33.3253886Z 2022-12-01T10:36:33.3254011Z Generating XML reports... 2022-12-01T10:36:33.3254551Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102046.xml 2022-12-01T10:36:33.3255213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3255692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3256271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3256751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3256988Z 2022-12-01T10:36:33.3257101Z Running tests... 2022-12-01T10:36:33.3257485Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3258007Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3258486Z test_broadcast_coalesced_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3258933Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 887 2022-12-01T10:36:33.3259473Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 888 2022-12-01T10:36:33.3260079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3260539Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3261093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3261572Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3262156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3262600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3263151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3263624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3264059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3264513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3264855Z ok (6.408s) 2022-12-01T10:36:33.3265024Z 2022-12-01T10:36:33.3265309Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3265646Z Ran 1 test in 6.408s 2022-12-01T10:36:33.3265790Z 2022-12-01T10:36:33.3265885Z OK 2022-12-01T10:36:33.3266018Z 2022-12-01T10:36:33.3266187Z Generating XML reports... 2022-12-01T10:36:33.3266743Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102054.xml 2022-12-01T10:36:33.3267387Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3267847Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3268422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3268953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3269176Z 2022-12-01T10:36:33.3269295Z Running tests... 2022-12-01T10:36:33.3269711Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3270259Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3270716Z test_nccl_barrier (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3271136Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1004 2022-12-01T10:36:33.3271567Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1005 2022-12-01T10:36:33.3272171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3272607Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3273182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3273647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3274216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3274637Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3275203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3275660Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3276094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3276618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3277002Z skip: Need at least 4 CUDA devices (3.722s) 2022-12-01T10:36:33.3277194Z 2022-12-01T10:36:33.3277468Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3277779Z Ran 1 test in 3.722s 2022-12-01T10:36:33.3277937Z 2022-12-01T10:36:33.3278043Z OK (skipped=1) 2022-12-01T10:36:33.3278195Z 2022-12-01T10:36:33.3278316Z Generating XML reports... 2022-12-01T10:36:33.3278853Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102103.xml 2022-12-01T10:36:33.3279515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3279966Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3280538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3281015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3281231Z 2022-12-01T10:36:33.3281346Z Running tests... 2022-12-01T10:36:33.3281759Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3282302Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3283087Z test_nccl_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3283528Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1107 2022-12-01T10:36:33.3283966Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1108 2022-12-01T10:36:33.3284576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3285003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3285573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3286043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3286703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3287147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3287725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3288184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3288617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3289069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3289548Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3290051Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3290705Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3291428Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3291824Z ok (5.313s) 2022-12-01T10:36:33.3291974Z 2022-12-01T10:36:33.3292237Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3292545Z Ran 1 test in 5.313s 2022-12-01T10:36:33.3292705Z 2022-12-01T10:36:33.3292797Z OK 2022-12-01T10:36:33.3292930Z 2022-12-01T10:36:33.3293054Z Generating XML reports... 2022-12-01T10:36:33.3293599Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102109.xml 2022-12-01T10:36:33.3294241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3294781Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3295366Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3295817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3296044Z 2022-12-01T10:36:33.3296154Z Running tests... 2022-12-01T10:36:33.3296557Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3297080Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3297565Z test_nccl_barrier_device_ids_function_argument (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3298046Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1223 2022-12-01T10:36:33.3298491Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1224 2022-12-01T10:36:33.3299142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3299588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3300162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3300626Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3301208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3301641Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3302209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3302697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3303131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3303679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3304184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3304662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3305350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3306033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3306434Z ok (3.721s) 2022-12-01T10:36:33.3306589Z 2022-12-01T10:36:33.3306859Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3307176Z Ran 1 test in 3.721s 2022-12-01T10:36:33.3307336Z 2022-12-01T10:36:33.3307428Z OK 2022-12-01T10:36:33.3307561Z 2022-12-01T10:36:33.3307686Z Generating XML reports... 2022-12-01T10:36:33.3308238Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102117.xml 2022-12-01T10:36:33.3308888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3309322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3309888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3310338Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3310563Z 2022-12-01T10:36:33.3310672Z Running tests... 2022-12-01T10:36:33.3311071Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3311594Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3312125Z test_nccl_barrier_timeout (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3312582Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1330 2022-12-01T10:36:33.3313026Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1331 2022-12-01T10:36:33.3313616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3314068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3314638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3315098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3315652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3316103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3316673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3317119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3317548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3318018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3318407Z skip: Need at least 4 CUDA devices (3.733s) 2022-12-01T10:36:33.3318599Z 2022-12-01T10:36:33.3318852Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3319177Z Ran 1 test in 3.733s 2022-12-01T10:36:33.3319336Z 2022-12-01T10:36:33.3319446Z OK (skipped=1) 2022-12-01T10:36:33.3319599Z 2022-12-01T10:36:33.3319704Z Generating XML reports... 2022-12-01T10:36:33.3320244Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102123.xml 2022-12-01T10:36:33.3320962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3321423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3321981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3322864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3323102Z 2022-12-01T10:36:33.3323214Z Running tests... 2022-12-01T10:36:33.3323601Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3324130Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3324610Z test_nccl_barrier_timeout_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3325082Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1433 2022-12-01T10:36:33.3325505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1434 2022-12-01T10:36:33.3326103Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3326554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3327127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3327573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3328151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3328594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3329141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3329716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3330158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3330631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3331007Z skip: Need at least 4 CUDA devices (3.723s) 2022-12-01T10:36:33.3331202Z 2022-12-01T10:36:33.3331475Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3331802Z Ran 1 test in 3.723s 2022-12-01T10:36:33.3331964Z 2022-12-01T10:36:33.3332054Z OK (skipped=1) 2022-12-01T10:36:33.3332207Z 2022-12-01T10:36:33.3332330Z Generating XML reports... 2022-12-01T10:36:33.3332867Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102129.xml 2022-12-01T10:36:33.3333524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3333963Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3334533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3334996Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3335222Z 2022-12-01T10:36:33.3335313Z Running tests... 2022-12-01T10:36:33.3335714Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3336236Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3336731Z test_nccl_barrier_timeout_new_group_non_member (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3337191Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1536 2022-12-01T10:36:33.3337632Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1537 2022-12-01T10:36:33.3338239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3338743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3339335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3339796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3340365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3340785Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3341354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3341811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3342249Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3342700Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3343083Z skip: Need at least 4 CUDA devices (3.718s) 2022-12-01T10:36:33.3343273Z 2022-12-01T10:36:33.3343544Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3343850Z Ran 1 test in 3.718s 2022-12-01T10:36:33.3344009Z 2022-12-01T10:36:33.3344118Z OK (skipped=1) 2022-12-01T10:36:33.3344278Z 2022-12-01T10:36:33.3344406Z Generating XML reports... 2022-12-01T10:36:33.3344938Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102134.xml 2022-12-01T10:36:33.3345581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3346030Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3346680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3347134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3347362Z 2022-12-01T10:36:33.3347473Z Running tests... 2022-12-01T10:36:33.3347876Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3348404Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3348871Z test_nccl_warn_not_in_group_debug_detail (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3349343Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1639 2022-12-01T10:36:33.3349780Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1640 2022-12-01T10:36:33.3350362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3350813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3351388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3351853Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3352407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3352847Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3353411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3353852Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3354283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3354752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3355237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3355769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3356448Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3357131Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3357661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3358129Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3358772Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3359449Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3359844Z ok (5.447s) 2022-12-01T10:36:33.3359995Z 2022-12-01T10:36:33.3360259Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3360566Z Ran 1 test in 5.447s 2022-12-01T10:36:33.3360726Z 2022-12-01T10:36:33.3360819Z OK 2022-12-01T10:36:33.3360948Z 2022-12-01T10:36:33.3361070Z Generating XML reports... 2022-12-01T10:36:33.3361587Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102140.xml 2022-12-01T10:36:33.3362242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3362950Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3363525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3364075Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3364305Z 2022-12-01T10:36:33.3364414Z Running tests... 2022-12-01T10:36:33.3364822Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3365329Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3365820Z test_nccl_warn_not_in_group_debug_info (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3366323Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1770 2022-12-01T10:36:33.3366754Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1771 2022-12-01T10:36:33.3367341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3367791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3368368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3368827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3369384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3369824Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3370384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3370824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3371259Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3371742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3372225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3372683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3373437Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3373984Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3374615Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3375142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3375779Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3376452Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3376829Z ok (5.330s) 2022-12-01T10:36:33.3376975Z 2022-12-01T10:36:33.3377244Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3377569Z Ran 1 test in 5.330s 2022-12-01T10:36:33.3377728Z 2022-12-01T10:36:33.3377822Z OK 2022-12-01T10:36:33.3377935Z 2022-12-01T10:36:33.3378057Z Generating XML reports... 2022-12-01T10:36:33.3378607Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102148.xml 2022-12-01T10:36:33.3379292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3379725Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3380289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3380761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3381056Z 2022-12-01T10:36:33.3381168Z Running tests... 2022-12-01T10:36:33.3381548Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3382075Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3382573Z test_nccl_warn_not_in_group_debug_off (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3383018Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1892 2022-12-01T10:36:33.3383447Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1893 2022-12-01T10:36:33.3384039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3384482Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3385032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3385494Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3386067Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3386506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3387053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3387511Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3387944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3388413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3388890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3389369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3390075Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3390606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3391248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3391767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3392409Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3393071Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3393457Z ok (5.440s) 2022-12-01T10:36:33.3393606Z 2022-12-01T10:36:33.3393868Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3394174Z Ran 1 test in 5.440s 2022-12-01T10:36:33.3394333Z 2022-12-01T10:36:33.3394425Z OK 2022-12-01T10:36:33.3394560Z 2022-12-01T10:36:33.3394683Z Generating XML reports... 2022-12-01T10:36:33.3395219Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102156.xml 2022-12-01T10:36:33.3395864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3396304Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3396869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3397319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3397543Z 2022-12-01T10:36:33.3397649Z Running tests... 2022-12-01T10:36:33.3398116Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3398642Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3399093Z test_nncl_rank_membership (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3399545Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2014 2022-12-01T10:36:33.3399983Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2015 2022-12-01T10:36:33.3400564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3401011Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3401573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3402031Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3402846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3403299Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3403872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3404313Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3404742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3405224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3405703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3406163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3406821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3407427Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3408091Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3408604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3409241Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3409911Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3410298Z ok (3.842s) 2022-12-01T10:36:33.3410428Z 2022-12-01T10:36:33.3410693Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3411024Z Ran 1 test in 3.842s 2022-12-01T10:36:33.3411182Z 2022-12-01T10:36:33.3411275Z OK 2022-12-01T10:36:33.3411390Z 2022-12-01T10:36:33.3411512Z Generating XML reports... 2022-12-01T10:36:33.3412050Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102203.xml 2022-12-01T10:36:33.3412706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3413133Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3413698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3414157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3414381Z 2022-12-01T10:36:33.3414488Z Running tests... 2022-12-01T10:36:33.3414867Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3415488Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3415988Z test_pass_nccl_options_high_priority_stream (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3416469Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2123 2022-12-01T10:36:33.3416892Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2124 2022-12-01T10:36:33.3417484Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3417926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3418464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3418905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3419475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3419941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3420504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3420965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3421398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3421939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3422402Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3422879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3423530Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3424049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3424749Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3425283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3425923Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3426580Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3426965Z ok (6.445s) 2022-12-01T10:36:33.3427111Z 2022-12-01T10:36:33.3427377Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3427704Z Ran 1 test in 6.445s 2022-12-01T10:36:33.3427848Z 2022-12-01T10:36:33.3427939Z OK 2022-12-01T10:36:33.3428073Z 2022-12-01T10:36:33.3428200Z Generating XML reports... 2022-12-01T10:36:33.3428735Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102209.xml 2022-12-01T10:36:33.3429380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3429825Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3430389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3430853Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3431063Z 2022-12-01T10:36:33.3431170Z Running tests... 2022-12-01T10:36:33.3431566Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3432086Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3432643Z test_sequence_num_incremented_nccl_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3433118Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2244 2022-12-01T10:36:33.3433560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2245 2022-12-01T10:36:33.3434162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3434588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3435148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3435610Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3436162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3436601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3437169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3437629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3438044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3438529Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3439008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3439480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3440109Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3440786Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3441606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3442548Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3443515Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3444194Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3444580Z ok (5.440s) 2022-12-01T10:36:33.3444728Z 2022-12-01T10:36:33.3444978Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3445301Z Ran 1 test in 5.440s 2022-12-01T10:36:33.3445458Z 2022-12-01T10:36:33.3445549Z OK 2022-12-01T10:36:33.3445680Z 2022-12-01T10:36:33.3445786Z Generating XML reports... 2022-12-01T10:36:33.3446328Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102218.xml 2022-12-01T10:36:33.3446992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3473693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3474333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3474805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3475037Z 2022-12-01T10:36:33.3475149Z Running tests... 2022-12-01T10:36:33.3475542Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3476078Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3476587Z test_sequence_num_incremented_nccl_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3477241Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2366 2022-12-01T10:36:33.3477675Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2367 2022-12-01T10:36:33.3478287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3478742Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3479295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3479761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3480339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3480779Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3481332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3481803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3482240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3482999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3483377Z skip: Need at least 4 CUDA devices (3.715s) 2022-12-01T10:36:33.3483570Z 2022-12-01T10:36:33.3483848Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3484182Z Ran 1 test in 3.715s 2022-12-01T10:36:33.3484343Z 2022-12-01T10:36:33.3484434Z OK (skipped=1) 2022-12-01T10:36:33.3484586Z 2022-12-01T10:36:33.3484710Z Generating XML reports... 2022-12-01T10:36:33.3485252Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102226.xml 2022-12-01T10:36:33.3485917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3486354Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3487026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3487515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3487742Z 2022-12-01T10:36:33.3487834Z Running tests... 2022-12-01T10:36:33.3488242Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3488769Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3489255Z test_sequence_num_set_default_pg_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3489708Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2469 2022-12-01T10:36:33.3490147Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2470 2022-12-01T10:36:33.3490750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3491188Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3491762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3492224Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3492796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3493219Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3493779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3494235Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3494749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3495249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3495735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3496220Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3496859Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3497547Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3497943Z ok (5.330s) 2022-12-01T10:36:33.3498092Z 2022-12-01T10:36:33.3498359Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3498672Z Ran 1 test in 5.330s 2022-12-01T10:36:33.3498839Z 2022-12-01T10:36:33.3498933Z OK 2022-12-01T10:36:33.3499065Z 2022-12-01T10:36:33.3499191Z Generating XML reports... 2022-12-01T10:36:33.3499716Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102232.xml 2022-12-01T10:36:33.3500386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3500830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3501403Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3501860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3502088Z 2022-12-01T10:36:33.3502204Z Running tests... 2022-12-01T10:36:33.3502604Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3503112Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3503605Z test_sequence_num_set_nccl_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3504142Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2585 2022-12-01T10:36:33.3504608Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2586 2022-12-01T10:36:33.3505201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3505652Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3506229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3506678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3507257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3507697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3508265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3508716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3509146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3509633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3510115Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3510570Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3511225Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3511916Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3512527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:36:33.3513005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:36:33.3513657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3514342Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:36:33.3514713Z ok (5.327s) 2022-12-01T10:36:33.3514858Z 2022-12-01T10:36:33.3515124Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3515454Z Ran 1 test in 5.327s 2022-12-01T10:36:33.3515615Z 2022-12-01T10:36:33.3515710Z OK 2022-12-01T10:36:33.3515825Z 2022-12-01T10:36:33.3515954Z Generating XML reports... 2022-12-01T10:36:33.3516490Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102239.xml 2022-12-01T10:36:33.3517161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3517588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3518163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3518634Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3518863Z 2022-12-01T10:36:33.3518973Z Running tests... 2022-12-01T10:36:33.3519355Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3519887Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3520356Z test_tensor_dtype_complex (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3520813Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2705 2022-12-01T10:36:33.3521301Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2706 2022-12-01T10:36:33.3521934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3522631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3523209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3523676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3524251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3524698Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3525254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3525729Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3526173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3526646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3527130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3527612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3528274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3528946Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3529452Z ok (6.419s) 2022-12-01T10:36:33.3529605Z 2022-12-01T10:36:33.3529877Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3530203Z Ran 1 test in 6.420s 2022-12-01T10:36:33.3530348Z 2022-12-01T10:36:33.3530446Z OK 2022-12-01T10:36:33.3530579Z 2022-12-01T10:36:33.3530702Z Generating XML reports... 2022-12-01T10:36:33.3531243Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102247.xml 2022-12-01T10:36:33.3531885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3532340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3532913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3533378Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3533591Z 2022-12-01T10:36:33.3533709Z Running tests... 2022-12-01T10:36:33.3534108Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3534643Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3535091Z test_tensor_dtype_mismatch (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3535554Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2822 2022-12-01T10:36:33.3536007Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2823 2022-12-01T10:36:33.3536606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3537033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3537603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3538071Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3538634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3539153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3539749Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3540218Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3540632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3541126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3541610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3542094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3542735Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3543424Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3544471Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2441: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2022-12-01T10:36:33.3545093Z warnings.warn( 2022-12-01T10:36:33.3545944Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1577: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2022-12-01T10:36:33.3546643Z warnings.warn( 2022-12-01T10:36:33.3547518Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2441: UserWarning: torch.distributed.all_gather_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2022-12-01T10:36:33.3548127Z warnings.warn( 2022-12-01T10:36:33.3548965Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:1577: UserWarning: torch.distributed.all_reduce_coalesced will be deprecated. If you must use it, please revisit our documentation later at https://pytorch.org/docs/master/distributed.html#collective-functions 2022-12-01T10:36:33.3549592Z warnings.warn( 2022-12-01T10:36:33.3549831Z ok (5.243s) 2022-12-01T10:36:33.3549977Z 2022-12-01T10:36:33.3550248Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3550554Z Ran 1 test in 5.243s 2022-12-01T10:36:33.3550712Z 2022-12-01T10:36:33.3550814Z OK 2022-12-01T10:36:33.3550948Z 2022-12-01T10:36:33.3551071Z Generating XML reports... 2022-12-01T10:36:33.3551593Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102256.xml 2022-12-01T10:36:33.3552267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3552719Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3553296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3553748Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3553975Z 2022-12-01T10:36:33.3554083Z Running tests... 2022-12-01T10:36:33.3554491Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3555016Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3555490Z test_allgather_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3555958Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2931 2022-12-01T10:36:33.3556463Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2932 2022-12-01T10:36:33.3557072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3557522Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3558094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3558565Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3559120Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3559573Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3560143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3560614Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3561032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3561501Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3561988Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3562714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3563392Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3564077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3565117Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3565826Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3566726Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3567439Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3567765Z ok (6.426s) 2022-12-01T10:36:33.3567912Z 2022-12-01T10:36:33.3568162Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3568503Z Ran 1 test in 6.426s 2022-12-01T10:36:33.3568662Z 2022-12-01T10:36:33.3568755Z OK 2022-12-01T10:36:33.3568887Z 2022-12-01T10:36:33.3569009Z Generating XML reports... 2022-12-01T10:36:33.3569556Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102303.xml 2022-12-01T10:36:33.3570237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3570685Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3571239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3571703Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3571932Z 2022-12-01T10:36:33.3572041Z Running tests... 2022-12-01T10:36:33.3572447Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3572965Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3573452Z test_allreduce_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3573995Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3046 2022-12-01T10:36:33.3574463Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3047 2022-12-01T10:36:33.3575049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3575496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3576068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3576515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3577093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3577547Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3578125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3578574Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3579012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3579478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3579950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3580443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3581101Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3581872Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3582780Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3583495Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3584349Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3585068Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3585391Z ok (6.434s) 2022-12-01T10:36:33.3585525Z 2022-12-01T10:36:33.3585795Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3586126Z Ran 1 test in 6.434s 2022-12-01T10:36:33.3586284Z 2022-12-01T10:36:33.3586381Z OK 2022-12-01T10:36:33.3586517Z 2022-12-01T10:36:33.3586623Z Generating XML reports... 2022-12-01T10:36:33.3587176Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102312.xml 2022-12-01T10:36:33.3587856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3588301Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3588862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3589328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3589562Z 2022-12-01T10:36:33.3589671Z Running tests... 2022-12-01T10:36:33.3590058Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3590590Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3591153Z test_broadcast_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3591637Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3161 2022-12-01T10:36:33.3592065Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3162 2022-12-01T10:36:33.3592668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3593117Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3593673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3594147Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3594730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3595183Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3595734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3596195Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3596638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3597103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3597574Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3598074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3598808Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3599482Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3600409Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3601128Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3601982Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3602945Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3603259Z ok (6.425s) 2022-12-01T10:36:33.3603409Z 2022-12-01T10:36:33.3603682Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3604016Z Ran 1 test in 6.425s 2022-12-01T10:36:33.3604178Z 2022-12-01T10:36:33.3604253Z OK 2022-12-01T10:36:33.3604385Z 2022-12-01T10:36:33.3604515Z Generating XML reports... 2022-12-01T10:36:33.3605075Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102321.xml 2022-12-01T10:36:33.3605756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3606186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3606758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3607227Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3607465Z 2022-12-01T10:36:33.3607575Z Running tests... 2022-12-01T10:36:33.3607959Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3608580Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3609090Z test_consecutive_comm_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3609557Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3276 2022-12-01T10:36:33.3610008Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3277 2022-12-01T10:36:33.3610624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3611069Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3611627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3612107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3612685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3613119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3613691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3614161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3614598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3615056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3615544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3616043Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3616830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3617508Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3618438Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3619166Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3620012Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3620719Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3621581Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3622297Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3623147Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3623855Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3624682Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3625451Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3626309Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3627006Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3627822Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant2 target _tensor_constant2 _tensor_constant2 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3628544Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3629389Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant3 target _tensor_constant3 _tensor_constant3 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3630092Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3630419Z ok (6.458s) 2022-12-01T10:36:33.3630571Z 2022-12-01T10:36:33.3630860Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3631188Z Ran 1 test in 6.458s 2022-12-01T10:36:33.3631350Z 2022-12-01T10:36:33.3631451Z OK 2022-12-01T10:36:33.3631583Z 2022-12-01T10:36:33.3631689Z Generating XML reports... 2022-12-01T10:36:33.3632251Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102329.xml 2022-12-01T10:36:33.3633008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3633457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3634010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3634475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3634702Z 2022-12-01T10:36:33.3634809Z Running tests... 2022-12-01T10:36:33.3635196Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3635725Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3636216Z test_nested_comm_tensor_wrapping (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3636683Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3391 2022-12-01T10:36:33.3637117Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3392 2022-12-01T10:36:33.3637720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3638169Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3638725Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3639194Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3639756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3640202Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3640745Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3641202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3641642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3642167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3642893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3643388Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3644056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3644752Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3645657Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3646383Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3647235Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3647952Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3648786Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3649496Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3650457Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3651204Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3651526Z ok (6.404s) 2022-12-01T10:36:33.3651656Z 2022-12-01T10:36:33.3651927Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3652251Z Ran 1 test in 6.405s 2022-12-01T10:36:33.3652411Z 2022-12-01T10:36:33.3652508Z OK 2022-12-01T10:36:33.3652622Z 2022-12-01T10:36:33.3652745Z Generating XML reports... 2022-12-01T10:36:33.3653303Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102338.xml 2022-12-01T10:36:33.3653978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3654432Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3654997Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3655490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3655720Z 2022-12-01T10:36:33.3655829Z Running tests... 2022-12-01T10:36:33.3656214Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3656747Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3657237Z test_reduce_scatter_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3657709Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3506 2022-12-01T10:36:33.3658141Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3507 2022-12-01T10:36:33.3658747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3659205Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3659843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3660342Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3660920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3661369Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3661918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3662379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3662816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3663285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3663755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3664258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3664916Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3665588Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3666552Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3667352Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3668209Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3668921Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3669745Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3670454Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3671306Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant1 target _tensor_constant1 _tensor_constant1 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3672025Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3672389Z ok (6.416s) 2022-12-01T10:36:33.3672541Z 2022-12-01T10:36:33.3672809Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3673138Z Ran 1 test in 6.416s 2022-12-01T10:36:33.3673299Z 2022-12-01T10:36:33.3673395Z OK 2022-12-01T10:36:33.3673509Z 2022-12-01T10:36:33.3673635Z Generating XML reports... 2022-12-01T10:36:33.3674189Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102347.xml 2022-12-01T10:36:33.3674862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3675292Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3675870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3676337Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3676636Z 2022-12-01T10:36:33.3676755Z Running tests... 2022-12-01T10:36:33.3677149Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3677677Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3678152Z test_scatter_work_wait_gpu (__main__.CompilerTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3678599Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3621 2022-12-01T10:36:33.3679047Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3622 2022-12-01T10:36:33.3679648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3680097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3680651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3681119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3681701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3682131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3683013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3683489Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3683934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3684399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3684992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.3685498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.3686161Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3686834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.3687760Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3688482Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3689371Z /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py:1306: UserWarning: Node _tensor_constant0 target _tensor_constant0 _tensor_constant0 of does not reference an nn.Module, nn.Parameter, or buffer, which is what 'get_attr' Nodes typically target 2022-12-01T10:36:33.3690072Z warnings.warn(f'Node {node} target {node.target} {atom} of {seen_qualname} does ' 2022-12-01T10:36:33.3690399Z ok (6.519s) 2022-12-01T10:36:33.3690548Z 2022-12-01T10:36:33.3690817Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3691144Z Ran 1 test in 6.519s 2022-12-01T10:36:33.3691288Z 2022-12-01T10:36:33.3691381Z OK 2022-12-01T10:36:33.3691514Z 2022-12-01T10:36:33.3691638Z Generating XML reports... 2022-12-01T10:36:33.3692189Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102355.xml 2022-12-01T10:36:33.3692848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3693303Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3693879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3694431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3694660Z 2022-12-01T10:36:33.3694774Z Running tests... 2022-12-01T10:36:33.3695184Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3695713Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3696225Z test_accumulate_gradients_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3696730Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3738 2022-12-01T10:36:33.3697173Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3739 2022-12-01T10:36:33.3697777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3698211Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3698793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3699260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3699818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3700257Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3700819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3701277Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3701691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3702233Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3702741Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppdxqz5if 2022-12-01T10:36:33.3703283Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppdxqz5if/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3703797Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7j4rz6lf 2022-12-01T10:36:33.3704329Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7j4rz6lf/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3704838Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3705301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3705648Z ok (6.937s) 2022-12-01T10:36:33.3705795Z 2022-12-01T10:36:33.3706071Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3706406Z Ran 1 test in 6.937s 2022-12-01T10:36:33.3706550Z 2022-12-01T10:36:33.3706643Z OK 2022-12-01T10:36:33.3706776Z 2022-12-01T10:36:33.3706900Z Generating XML reports... 2022-12-01T10:36:33.3707514Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102404.xml 2022-12-01T10:36:33.3708219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3708668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3709241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3709711Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3709922Z 2022-12-01T10:36:33.3710032Z Running tests... 2022-12-01T10:36:33.3710434Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3710961Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3711563Z test_accumulate_gradients_module_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3712110Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3859 2022-12-01T10:36:33.3712553Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3860 2022-12-01T10:36:33.3713161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3713592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3714168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3714636Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3715196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3715640Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3716213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3716677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3717096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3717565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3718062Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpymwg0634 2022-12-01T10:36:33.3718603Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpymwg0634/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3719118Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfghq6fnr 2022-12-01T10:36:33.3719782Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfghq6fnr/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3720298Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3720764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3721120Z ok (6.930s) 2022-12-01T10:36:33.3721270Z 2022-12-01T10:36:33.3721544Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3721925Z Ran 1 test in 6.930s 2022-12-01T10:36:33.3722067Z 2022-12-01T10:36:33.3722162Z OK 2022-12-01T10:36:33.3722297Z 2022-12-01T10:36:33.3722641Z Generating XML reports... 2022-12-01T10:36:33.3723277Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102413.xml 2022-12-01T10:36:33.3723978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3724436Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3725013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3725481Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3725691Z 2022-12-01T10:36:33.3725800Z Running tests... 2022-12-01T10:36:33.3726200Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3726730Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3727241Z test_arbitrary_forward_return_value (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3727750Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3980 2022-12-01T10:36:33.3728194Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3981 2022-12-01T10:36:33.3728803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3729317Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3729920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3730392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3730947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3731393Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3731962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3732427Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3732848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3733325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3733821Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkz8p3zgl 2022-12-01T10:36:33.3734361Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkz8p3zgl/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3734878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvka5w0pg 2022-12-01T10:36:33.3735411Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvka5w0pg/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3735921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3736388Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3736833Z ok (6.914s) 2022-12-01T10:36:33.3736983Z 2022-12-01T10:36:33.3737256Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3737586Z Ran 1 test in 6.914s 2022-12-01T10:36:33.3737729Z 2022-12-01T10:36:33.3737827Z OK 2022-12-01T10:36:33.3737963Z 2022-12-01T10:36:33.3738087Z Generating XML reports... 2022-12-01T10:36:33.3738699Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102422.xml 2022-12-01T10:36:33.3739404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3739855Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3740429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3740895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3741103Z 2022-12-01T10:36:33.3741220Z Running tests... 2022-12-01T10:36:33.3741622Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3742160Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3742689Z test_arbitrary_forward_return_value_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3743212Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4101 2022-12-01T10:36:33.3743661Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4102 2022-12-01T10:36:33.3744260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3744694Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3745269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3745744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3746365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3746825Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3747397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3747863Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3748283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3748752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3749245Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdk3v3ycx 2022-12-01T10:36:33.3749781Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdk3v3ycx/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3750301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph66z3vxk 2022-12-01T10:36:33.3750837Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph66z3vxk/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3751349Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3751808Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3752159Z ok (6.820s) 2022-12-01T10:36:33.3752307Z 2022-12-01T10:36:33.3752578Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3752909Z Ran 1 test in 6.820s 2022-12-01T10:36:33.3753051Z 2022-12-01T10:36:33.3753145Z OK 2022-12-01T10:36:33.3753278Z 2022-12-01T10:36:33.3753401Z Generating XML reports... 2022-12-01T10:36:33.3754012Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102432.xml 2022-12-01T10:36:33.3754792Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3755246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3755814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3756281Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3756492Z 2022-12-01T10:36:33.3756603Z Running tests... 2022-12-01T10:36:33.3757009Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3757532Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3758041Z test_bf16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3758545Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4222 2022-12-01T10:36:33.3759001Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4223 2022-12-01T10:36:33.3759614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3760048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3760621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3761087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3761651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3762094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3762933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3763411Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3763831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3764705Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.3765491Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3766311Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.3767118Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5st0j6_y 2022-12-01T10:36:33.3767644Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5st0j6_y/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3768178Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpokchlgsf 2022-12-01T10:36:33.3768714Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpokchlgsf/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3769078Z ok (6.325s) 2022-12-01T10:36:33.3769227Z 2022-12-01T10:36:33.3769509Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3769839Z Ran 1 test in 6.325s 2022-12-01T10:36:33.3770000Z 2022-12-01T10:36:33.3770095Z OK 2022-12-01T10:36:33.3770209Z 2022-12-01T10:36:33.3770335Z Generating XML reports... 2022-12-01T10:36:33.3770947Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102441.xml 2022-12-01T10:36:33.3771772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3772207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3772784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3773250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3773477Z 2022-12-01T10:36:33.3773586Z Running tests... 2022-12-01T10:36:33.3773971Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3774495Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3775016Z test_bf16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3775528Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4343 2022-12-01T10:36:33.3775957Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4344 2022-12-01T10:36:33.3776560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3777017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3777572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3778033Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3778605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3779044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3779590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3780053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3780542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3781331Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.3782098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3782876Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.3783666Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3nryw5ji 2022-12-01T10:36:33.3784209Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3nryw5ji/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3784725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2e3rolvo 2022-12-01T10:36:33.3785253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2e3rolvo/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3785630Z ok (6.436s) 2022-12-01T10:36:33.3785777Z 2022-12-01T10:36:33.3786053Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3786365Z Ran 1 test in 6.436s 2022-12-01T10:36:33.3786528Z 2022-12-01T10:36:33.3786624Z OK 2022-12-01T10:36:33.3786762Z 2022-12-01T10:36:33.3786889Z Generating XML reports... 2022-12-01T10:36:33.3787487Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102449.xml 2022-12-01T10:36:33.3788300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3788755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3789330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3789780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3790004Z 2022-12-01T10:36:33.3790112Z Running tests... 2022-12-01T10:36:33.3790514Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3791018Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3791536Z test_builtin_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3792041Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4464 2022-12-01T10:36:33.3792489Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4465 2022-12-01T10:36:33.3793079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3793529Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3794101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3794548Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3795118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3795559Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3796121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3796567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3797057Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3797549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3798053Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzy1c0i3n 2022-12-01T10:36:33.3798560Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp763ne3mk 2022-12-01T10:36:33.3799099Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzy1c0i3n/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3799635Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp763ne3mk/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3799738Z ok (6.532s) 2022-12-01T10:36:33.3799759Z 2022-12-01T10:36:33.3800021Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3800135Z Ran 1 test in 6.532s 2022-12-01T10:36:33.3800155Z 2022-12-01T10:36:33.3800247Z OK 2022-12-01T10:36:33.3800266Z 2022-12-01T10:36:33.3800394Z Generating XML reports... 2022-12-01T10:36:33.3800858Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102458.xml 2022-12-01T10:36:33.3801230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3801408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3801786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3801976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3801996Z 2022-12-01T10:36:33.3802087Z Running tests... 2022-12-01T10:36:33.3802642Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3802977Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3803277Z test_builtin_ddp_comm_hooks_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3803497Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4585 2022-12-01T10:36:33.3803713Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4586 2022-12-01T10:36:33.3804084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3804258Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3804619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3804811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3805181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3805359Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3805734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3805924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3806155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3806385Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3806646Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5cwabqt0 2022-12-01T10:36:33.3806899Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5cwabqt0/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3807157Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg601vlv2 2022-12-01T10:36:33.3807420Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg601vlv2/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3807604Z ok (6.441s) 2022-12-01T10:36:33.3807629Z 2022-12-01T10:36:33.3807912Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3808029Z Ran 1 test in 6.441s 2022-12-01T10:36:33.3808048Z 2022-12-01T10:36:33.3808142Z OK 2022-12-01T10:36:33.3808160Z 2022-12-01T10:36:33.3808284Z Generating XML reports... 2022-12-01T10:36:33.3808724Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102507.xml 2022-12-01T10:36:33.3809098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3809274Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3809657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3809847Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3809870Z 2022-12-01T10:36:33.3809978Z Running tests... 2022-12-01T10:36:33.3810241Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3810553Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3810828Z test_channels_last_contig (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3811027Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4706 2022-12-01T10:36:33.3811239Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4707 2022-12-01T10:36:33.3811607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3811863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3812246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3812440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3812808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3812983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3813335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3813523Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3813752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3813981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3814089Z ok (6.437s) 2022-12-01T10:36:33.3814109Z 2022-12-01T10:36:33.3814371Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3814486Z Ran 1 test in 6.437s 2022-12-01T10:36:33.3814505Z 2022-12-01T10:36:33.3814598Z OK 2022-12-01T10:36:33.3814617Z 2022-12-01T10:36:33.3814724Z Generating XML reports... 2022-12-01T10:36:33.3815182Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102515.xml 2022-12-01T10:36:33.3815550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3815726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3816099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3816288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3816311Z 2022-12-01T10:36:33.3816419Z Running tests... 2022-12-01T10:36:33.3816678Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3817057Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3817278Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3817648Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3817866Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4823 2022-12-01T10:36:33.3818083Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4824 2022-12-01T10:36:33.3818453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3818630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3819009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3819202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3819548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3819721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3820100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3820288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3820517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3820743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3821000Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5vo6xs_d 2022-12-01T10:36:33.3821335Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5vo6xs_d/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3821601Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphy7xxmfv 2022-12-01T10:36:33.3821852Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphy7xxmfv/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3821958Z ok (5.834s) 2022-12-01T10:36:33.3821978Z 2022-12-01T10:36:33.3822250Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3822362Z Ran 1 test in 5.834s 2022-12-01T10:36:33.3822382Z 2022-12-01T10:36:33.3822474Z OK 2022-12-01T10:36:33.3822493Z 2022-12-01T10:36:33.3822618Z Generating XML reports... 2022-12-01T10:36:33.3823073Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102524.xml 2022-12-01T10:36:33.3823443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3823604Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3823983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3824174Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3824193Z 2022-12-01T10:36:33.3824307Z Running tests... 2022-12-01T10:36:33.3824567Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3824876Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3825118Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3825385Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3825607Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4943 2022-12-01T10:36:33.3825804Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4944 2022-12-01T10:36:33.3826225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3826414Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3826797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3826989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3827357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3827533Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3827905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3828078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3828312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3828538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3828794Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa8w7t1qu 2022-12-01T10:36:33.3829060Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa8w7t1qu/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3829311Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkt6h5ck1 2022-12-01T10:36:33.3829578Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkt6h5ck1/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3829681Z ok (5.817s) 2022-12-01T10:36:33.3829701Z 2022-12-01T10:36:33.3829968Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3830124Z Ran 1 test in 5.817s 2022-12-01T10:36:33.3830144Z 2022-12-01T10:36:33.3830238Z OK 2022-12-01T10:36:33.3830257Z 2022-12-01T10:36:33.3830381Z Generating XML reports... 2022-12-01T10:36:33.3830849Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102532.xml 2022-12-01T10:36:33.3831221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3831397Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3831773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3831968Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3831989Z 2022-12-01T10:36:33.3832101Z Running tests... 2022-12-01T10:36:33.3832343Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3832664Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3832906Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3833162Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3833411Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5063 2022-12-01T10:36:33.3833655Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5064 2022-12-01T10:36:33.3834034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3834211Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3834594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3834773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3835138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3835362Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3835750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3835941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3836173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3836400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3836655Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp9efvmoc 2022-12-01T10:36:33.3836925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp9efvmoc/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3837165Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplkm8kr8p 2022-12-01T10:36:33.3837436Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplkm8kr8p/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3837671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3837905Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3838135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3838364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3839280Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3839456Z warnings.warn( 2022-12-01T10:36:33.3840379Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3840495Z warnings.warn( 2022-12-01T10:36:33.3840710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3840939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3841170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3841401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3841626Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3841851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3842076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3842303Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3842597Z ok (5.932s) 2022-12-01T10:36:33.3842622Z 2022-12-01T10:36:33.3842909Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3843023Z Ran 1 test in 5.933s 2022-12-01T10:36:33.3843044Z 2022-12-01T10:36:33.3843136Z OK 2022-12-01T10:36:33.3843155Z 2022-12-01T10:36:33.3843279Z Generating XML reports... 2022-12-01T10:36:33.3843743Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102540.xml 2022-12-01T10:36:33.3844122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3844380Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3844781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3844957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3844976Z 2022-12-01T10:36:33.3845086Z Running tests... 2022-12-01T10:36:33.3845350Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3845665Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3845906Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3846152Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3846373Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5183 2022-12-01T10:36:33.3846592Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5184 2022-12-01T10:36:33.3846945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3847120Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3847497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3847689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3848054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3848226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3848687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3848879Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3849109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3849312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3849573Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi0_8_m_v 2022-12-01T10:36:33.3849842Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi0_8_m_v/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3850097Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpskugn3co 2022-12-01T10:36:33.3850365Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpskugn3co/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3850598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3850837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3851070Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3851282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3852196Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3852312Z warnings.warn( 2022-12-01T10:36:33.3853271Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3853400Z warnings.warn( 2022-12-01T10:36:33.3853638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3853866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3854095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3854326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3854556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3854763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3854990Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3855218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3855318Z ok (5.937s) 2022-12-01T10:36:33.3855339Z 2022-12-01T10:36:33.3855609Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3855722Z Ran 1 test in 5.937s 2022-12-01T10:36:33.3855742Z 2022-12-01T10:36:33.3855837Z OK 2022-12-01T10:36:33.3855856Z 2022-12-01T10:36:33.3855980Z Generating XML reports... 2022-12-01T10:36:33.3856441Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102548.xml 2022-12-01T10:36:33.3856795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3856971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3857415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3857607Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3857630Z 2022-12-01T10:36:33.3857741Z Running tests... 2022-12-01T10:36:33.3858005Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3858318Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3858583Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3858913Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3859132Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5303 2022-12-01T10:36:33.3859351Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5304 2022-12-01T10:36:33.3859723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3859900Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3860281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3860469Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3860834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3861004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3861358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3861545Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3861774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3862002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3862308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph_36mbsa 2022-12-01T10:36:33.3862583Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph_36mbsa/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3862832Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt5jytx68 2022-12-01T10:36:33.3863095Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt5jytx68/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3863312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3863537Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3863768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3864002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3864103Z ok (5.927s) 2022-12-01T10:36:33.3864123Z 2022-12-01T10:36:33.3864397Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3864506Z Ran 1 test in 5.928s 2022-12-01T10:36:33.3864526Z 2022-12-01T10:36:33.3864619Z OK 2022-12-01T10:36:33.3864638Z 2022-12-01T10:36:33.3864759Z Generating XML reports... 2022-12-01T10:36:33.3865203Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102557.xml 2022-12-01T10:36:33.3865571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3865739Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3866112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3866397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3866418Z 2022-12-01T10:36:33.3866529Z Running tests... 2022-12-01T10:36:33.3866804Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3867121Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3867366Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3867713Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3867928Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5423 2022-12-01T10:36:33.3868140Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5424 2022-12-01T10:36:33.3868505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3868678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3869059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3869250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3869613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3869768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3870136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3870321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3870541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3870761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3871020Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnenif05z 2022-12-01T10:36:33.3871347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnenif05z/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3871613Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9padtli8 2022-12-01T10:36:33.3871864Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9padtli8/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3872098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3872332Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3872560Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3872787Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3872883Z ok (5.839s) 2022-12-01T10:36:33.3872904Z 2022-12-01T10:36:33.3873175Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3873287Z Ran 1 test in 5.839s 2022-12-01T10:36:33.3873310Z 2022-12-01T10:36:33.3873385Z OK 2022-12-01T10:36:33.3873423Z 2022-12-01T10:36:33.3873528Z Generating XML reports... 2022-12-01T10:36:33.3873982Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102605.xml 2022-12-01T10:36:33.3874348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3874521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3874898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3875078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3875151Z 2022-12-01T10:36:33.3875267Z Running tests... 2022-12-01T10:36:33.3875532Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3875829Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3876076Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3876450Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3876668Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5543 2022-12-01T10:36:33.3876879Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5544 2022-12-01T10:36:33.3877245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3877412Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3877784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3877939Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3878314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3878496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3878870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3879061Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3879291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3879516Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3879773Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsiylpvyp 2022-12-01T10:36:33.3880041Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsiylpvyp/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3880353Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp975g9e6q 2022-12-01T10:36:33.3880624Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp975g9e6q/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3880850Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3881083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3881864Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T10:36:33.3882894Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T10:36:33.3883143Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3883367Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3883555Z ok (5.929s) 2022-12-01T10:36:33.3883575Z 2022-12-01T10:36:33.3883851Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3883962Z Ran 1 test in 5.929s 2022-12-01T10:36:33.3883981Z 2022-12-01T10:36:33.3884068Z OK 2022-12-01T10:36:33.3884088Z 2022-12-01T10:36:33.3884194Z Generating XML reports... 2022-12-01T10:36:33.3884657Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102613.xml 2022-12-01T10:36:33.3885025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3885190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3885562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3885749Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3885773Z 2022-12-01T10:36:33.3885876Z Running tests... 2022-12-01T10:36:33.3886134Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3886440Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3886663Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3887034Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3887248Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5663 2022-12-01T10:36:33.3887455Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5664 2022-12-01T10:36:33.3887821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3887988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3888361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3888547Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3888965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3889145Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3889512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3889690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3889911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3890126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3890371Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2tmth01e 2022-12-01T10:36:33.3890636Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2tmth01e/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3890879Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprdt7_iwz 2022-12-01T10:36:33.3891128Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprdt7_iwz/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3891221Z ok (5.935s) 2022-12-01T10:36:33.3891241Z 2022-12-01T10:36:33.3891500Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3891602Z Ran 1 test in 5.935s 2022-12-01T10:36:33.3891622Z 2022-12-01T10:36:33.3891702Z OK 2022-12-01T10:36:33.3891721Z 2022-12-01T10:36:33.3891834Z Generating XML reports... 2022-12-01T10:36:33.3892279Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102621.xml 2022-12-01T10:36:33.3892636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3892853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3893226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3893412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3893432Z 2022-12-01T10:36:33.3893534Z Running tests... 2022-12-01T10:36:33.3893788Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3894088Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3894323Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3894584Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3894791Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5783 2022-12-01T10:36:33.3894992Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5784 2022-12-01T10:36:33.3895355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3895521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3895888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3896075Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3896435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3896603Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3896970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3897160Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3897369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3897694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3897956Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkhg66xa2 2022-12-01T10:36:33.3898213Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkhg66xa2/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3898456Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7benw5w9 2022-12-01T10:36:33.3898720Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7benw5w9/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3898952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3899179Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3899398Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3899623Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3899719Z ok (5.844s) 2022-12-01T10:36:33.3899738Z 2022-12-01T10:36:33.3899996Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3900099Z Ran 1 test in 5.844s 2022-12-01T10:36:33.3900119Z 2022-12-01T10:36:33.3900202Z OK 2022-12-01T10:36:33.3900221Z 2022-12-01T10:36:33.3900336Z Generating XML reports... 2022-12-01T10:36:33.3900783Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102629.xml 2022-12-01T10:36:33.3901143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3901300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3901736Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3901924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3901944Z 2022-12-01T10:36:33.3902043Z Running tests... 2022-12-01T10:36:33.3902293Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3902594Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3902839Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3903097Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3903296Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5903 2022-12-01T10:36:33.3903501Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5904 2022-12-01T10:36:33.3903864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3904033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3904401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3904584Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3904935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3905097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3905461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3905631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3905854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3906069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3906374Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfe2jcvdv 2022-12-01T10:36:33.3906646Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfe2jcvdv/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3906887Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp971o3z8h 2022-12-01T10:36:33.3907140Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp971o3z8h/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3907917Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T10:36:33.3908691Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T10:36:33.3909599Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3909770Z warnings.warn( 2022-12-01T10:36:33.3910680Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3910787Z warnings.warn( 2022-12-01T10:36:33.3911020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3911238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3911462Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3911692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3911788Z ok (6.057s) 2022-12-01T10:36:33.3911809Z 2022-12-01T10:36:33.3912072Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3912181Z Ran 1 test in 6.057s 2022-12-01T10:36:33.3912201Z 2022-12-01T10:36:33.3912287Z OK 2022-12-01T10:36:33.3912306Z 2022-12-01T10:36:33.3912424Z Generating XML reports... 2022-12-01T10:36:33.3912868Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102637.xml 2022-12-01T10:36:33.3913234Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3913403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3913772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3913965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3913985Z 2022-12-01T10:36:33.3914086Z Running tests... 2022-12-01T10:36:33.3914392Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3914717Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3914955Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3915224Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3915436Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6023 2022-12-01T10:36:33.3915650Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6024 2022-12-01T10:36:33.3916017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3916190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3916568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3916752Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3917108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3917261Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3917631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3917815Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3918043Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3918329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3918584Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4giuy8xe 2022-12-01T10:36:33.3918852Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4giuy8xe/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3919096Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm7q0srsj 2022-12-01T10:36:33.3919358Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm7q0srsj/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3920252Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3920364Z warnings.warn( 2022-12-01T10:36:33.3921271Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T10:36:33.3921382Z warnings.warn( 2022-12-01T10:36:33.3921610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3921831Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3922060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3922283Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3922603Z ok (5.938s) 2022-12-01T10:36:33.3922634Z 2022-12-01T10:36:33.3922919Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3923020Z Ran 1 test in 5.938s 2022-12-01T10:36:33.3923040Z 2022-12-01T10:36:33.3923134Z OK 2022-12-01T10:36:33.3923235Z 2022-12-01T10:36:33.3923369Z Generating XML reports... 2022-12-01T10:36:33.3923830Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102646.xml 2022-12-01T10:36:33.3924202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3924373Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3924749Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3924938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3924957Z 2022-12-01T10:36:33.3925053Z Running tests... 2022-12-01T10:36:33.3925315Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3925625Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3925884Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3926116Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3926324Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6143 2022-12-01T10:36:33.3926541Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6144 2022-12-01T10:36:33.3926902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3927069Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3927427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3927697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3928066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3928230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3928602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3928783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3929003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3929224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3929464Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc691co_6 2022-12-01T10:36:33.3929729Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc691co_6/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3929977Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzgubwqid 2022-12-01T10:36:33.3930249Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzgubwqid/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3930474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3930702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3930930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3931150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3931246Z ok (5.933s) 2022-12-01T10:36:33.3931266Z 2022-12-01T10:36:33.3931518Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3931634Z Ran 1 test in 5.933s 2022-12-01T10:36:33.3931653Z 2022-12-01T10:36:33.3931737Z OK 2022-12-01T10:36:33.3931755Z 2022-12-01T10:36:33.3931873Z Generating XML reports... 2022-12-01T10:36:33.3932386Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102654.xml 2022-12-01T10:36:33.3932769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3932938Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3933311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3933486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3933519Z 2022-12-01T10:36:33.3933611Z Running tests... 2022-12-01T10:36:33.3933867Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3934182Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3934432Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3934670Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3934888Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6263 2022-12-01T10:36:33.3935101Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6264 2022-12-01T10:36:33.3935462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3935619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3935978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3936145Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3936583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3936772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3937144Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3937327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3937548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3937759Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3938011Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwvogi19l 2022-12-01T10:36:33.3938271Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwvogi19l/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3938522Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn2y4xxqi 2022-12-01T10:36:33.3938783Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn2y4xxqi/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3939014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3939239Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3939463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3939688Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3939895Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3940118Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3940336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3940566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3940660Z ok (5.947s) 2022-12-01T10:36:33.3940680Z 2022-12-01T10:36:33.3941003Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3941125Z Ran 1 test in 5.948s 2022-12-01T10:36:33.3941144Z 2022-12-01T10:36:33.3941232Z OK 2022-12-01T10:36:33.3941251Z 2022-12-01T10:36:33.3941356Z Generating XML reports... 2022-12-01T10:36:33.3941814Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102702.xml 2022-12-01T10:36:33.3942183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3942354Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3942726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3942916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3942936Z 2022-12-01T10:36:33.3943038Z Running tests... 2022-12-01T10:36:33.3943301Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3943595Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3943884Z test_ddp_comm_hook_allreduce_hook_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3944096Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6383 2022-12-01T10:36:33.3944301Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6384 2022-12-01T10:36:33.3944666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3944833Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3945270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3945459Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3945820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3945972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3946336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3946525Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3946748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3946961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3947222Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxrawye87 2022-12-01T10:36:33.3947486Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxrawye87/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3947735Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzchz8_s4 2022-12-01T10:36:33.3947984Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzchz8_s4/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3948088Z ok (6.434s) 2022-12-01T10:36:33.3948107Z 2022-12-01T10:36:33.3948366Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3948470Z Ran 1 test in 6.434s 2022-12-01T10:36:33.3948489Z 2022-12-01T10:36:33.3948575Z OK 2022-12-01T10:36:33.3948595Z 2022-12-01T10:36:33.3948710Z Generating XML reports... 2022-12-01T10:36:33.3949157Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102710.xml 2022-12-01T10:36:33.3949527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3949696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3950107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3950305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3950325Z 2022-12-01T10:36:33.3950430Z Running tests... 2022-12-01T10:36:33.3950691Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3950996Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3951300Z test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3951516Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6504 2022-12-01T10:36:33.3951730Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6505 2022-12-01T10:36:33.3952082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3952255Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3952626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3952806Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3953168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3953335Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3953697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3953951Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3954178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3954390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3954641Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5ddzxs1d 2022-12-01T10:36:33.3954905Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5ddzxs1d/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3955152Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwla3760 2022-12-01T10:36:33.3955407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwla3760/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3955505Z ok (6.434s) 2022-12-01T10:36:33.3955526Z 2022-12-01T10:36:33.3955791Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3955898Z Ran 1 test in 6.435s 2022-12-01T10:36:33.3955922Z 2022-12-01T10:36:33.3955996Z OK 2022-12-01T10:36:33.3956028Z 2022-12-01T10:36:33.3956133Z Generating XML reports... 2022-12-01T10:36:33.3956588Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102719.xml 2022-12-01T10:36:33.3956950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3957114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3957482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3957663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3957683Z 2022-12-01T10:36:33.3957781Z Running tests... 2022-12-01T10:36:33.3958035Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3958328Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3958628Z test_ddp_comm_hook_allreduce_hook_nccl_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3958894Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6625 2022-12-01T10:36:33.3959117Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6626 2022-12-01T10:36:33.3959483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3959658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3960028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3960212Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3960571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3960732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3961101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3961285Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3961508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3961726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3961981Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvotwubme 2022-12-01T10:36:33.3962249Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvotwubme/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3962725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxyzogaj5 2022-12-01T10:36:33.3963078Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxyzogaj5/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3963175Z ok (6.340s) 2022-12-01T10:36:33.3963195Z 2022-12-01T10:36:33.3963468Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3963578Z Ran 1 test in 6.340s 2022-12-01T10:36:33.3963597Z 2022-12-01T10:36:33.3963689Z OK 2022-12-01T10:36:33.3963708Z 2022-12-01T10:36:33.3963829Z Generating XML reports... 2022-12-01T10:36:33.3964291Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102728.xml 2022-12-01T10:36:33.3964661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3964817Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3965191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3965387Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3965407Z 2022-12-01T10:36:33.3965514Z Running tests... 2022-12-01T10:36:33.3965778Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3966087Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3966360Z test_ddp_comm_hook_allreduce_with_then_hook_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3966647Z This unit test verifies whether a DDP communication hook that calls allreduce and then ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3966857Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6746 2022-12-01T10:36:33.3967053Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6747 2022-12-01T10:36:33.3967422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3967595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3968039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3968245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3968614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3968786Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3969156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3969344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3969554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3969776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3970035Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoludr1ki 2022-12-01T10:36:33.3970305Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoludr1ki/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3970551Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7_bt61cv 2022-12-01T10:36:33.3970813Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7_bt61cv/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3970914Z ok (6.441s) 2022-12-01T10:36:33.3970934Z 2022-12-01T10:36:33.3971198Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3971294Z Ran 1 test in 6.441s 2022-12-01T10:36:33.3971313Z 2022-12-01T10:36:33.3971401Z OK 2022-12-01T10:36:33.3971420Z 2022-12-01T10:36:33.3971539Z Generating XML reports... 2022-12-01T10:36:33.3971996Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102736.xml 2022-12-01T10:36:33.3972436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3972614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3972989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3973176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3973196Z 2022-12-01T10:36:33.3973300Z Running tests... 2022-12-01T10:36:33.3973546Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3973856Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3974083Z test_ddp_comm_hook_future_passing_gpu_nccl (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.3974372Z This unit test verifies whether the Future object is passed properly using nccl backend. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3974591Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6867 2022-12-01T10:36:33.3974806Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6868 2022-12-01T10:36:33.3975172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3975346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3975704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3975889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3976251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3976424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3976797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3977041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3977278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3977504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3977762Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkptvdy4k 2022-12-01T10:36:33.3978014Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkptvdy4k/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3978263Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnljq_k0a 2022-12-01T10:36:33.3978526Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnljq_k0a/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3978630Z ok (6.445s) 2022-12-01T10:36:33.3978650Z 2022-12-01T10:36:33.3978919Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3979031Z Ran 1 test in 6.445s 2022-12-01T10:36:33.3979054Z 2022-12-01T10:36:33.3979150Z OK 2022-12-01T10:36:33.3979169Z 2022-12-01T10:36:33.3979289Z Generating XML reports... 2022-12-01T10:36:33.3979731Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102745.xml 2022-12-01T10:36:33.3980102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3980274Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3980642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3980827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3980898Z 2022-12-01T10:36:33.3981012Z Running tests... 2022-12-01T10:36:33.3981280Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3981587Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3981855Z test_ddp_multi_device_module_config (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3982072Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6988 2022-12-01T10:36:33.3982285Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6989 2022-12-01T10:36:33.3982654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3982827Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3983203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3983395Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3983766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3983939Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3984292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3984478Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3984704Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3984927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3985072Z skip: Need at least 4 CUDA devices (3.711s) 2022-12-01T10:36:33.3985092Z 2022-12-01T10:36:33.3985353Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3985469Z Ran 1 test in 3.711s 2022-12-01T10:36:33.3985488Z 2022-12-01T10:36:33.3985592Z OK (skipped=1) 2022-12-01T10:36:33.3985611Z 2022-12-01T10:36:33.3985734Z Generating XML reports... 2022-12-01T10:36:33.3986228Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102754.xml 2022-12-01T10:36:33.3986611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3986788Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3987163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3987352Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3987371Z 2022-12-01T10:36:33.3987479Z Running tests... 2022-12-01T10:36:33.3987740Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3988055Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3988313Z test_ddp_weight_sharing (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3988529Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7091 2022-12-01T10:36:33.3988742Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7092 2022-12-01T10:36:33.3989106Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3989280Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3989656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3989848Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3990210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3990459Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3990825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3991013Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3991246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.3991472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.3991725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2k5fw4ki 2022-12-01T10:36:33.3991991Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2k5fw4ki/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3992242Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgbj3n8iy 2022-12-01T10:36:33.3992511Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgbj3n8iy/_remote_module_non_scriptable.py 2022-12-01T10:36:33.3992732Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3992961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3993193Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3993416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3993643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3993866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3994091Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3994314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.3994403Z ok (6.744s) 2022-12-01T10:36:33.3994435Z 2022-12-01T10:36:33.3994740Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3994864Z Ran 1 test in 6.744s 2022-12-01T10:36:33.3994883Z 2022-12-01T10:36:33.3994976Z OK 2022-12-01T10:36:33.3994995Z 2022-12-01T10:36:33.3995121Z Generating XML reports... 2022-12-01T10:36:33.3995587Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102800.xml 2022-12-01T10:36:33.3995956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3996128Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3996499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3996678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3996697Z 2022-12-01T10:36:33.3996803Z Running tests... 2022-12-01T10:36:33.3997063Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.3997376Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.3997644Z test_ddp_with_lazy_parameters (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.3997856Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7212 2022-12-01T10:36:33.3998064Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7213 2022-12-01T10:36:33.3998434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3998588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.3998964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.3999217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.3999583Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.3999754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4000127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4000315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4000543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4001081Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-12-01T10:36:33.4001341Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-12-01T10:36:33.4001563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4002098Z /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-12-01T10:36:33.4002367Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-12-01T10:36:33.4002855Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4f9g348j 2022-12-01T10:36:33.4003124Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4f9g348j/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4003376Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6aopwwat 2022-12-01T10:36:33.4003643Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6aopwwat/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4003751Z ok (3.732s) 2022-12-01T10:36:33.4003771Z 2022-12-01T10:36:33.4004025Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4004229Z Ran 1 test in 3.733s 2022-12-01T10:36:33.4004253Z 2022-12-01T10:36:33.4004351Z OK 2022-12-01T10:36:33.4004370Z 2022-12-01T10:36:33.4004496Z Generating XML reports... 2022-12-01T10:36:33.4004961Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102809.xml 2022-12-01T10:36:33.4005330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4005502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4005876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4006066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4006090Z 2022-12-01T10:36:33.4006181Z Running tests... 2022-12-01T10:36:33.4006440Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4006754Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4007034Z test_default_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4007250Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7319 2022-12-01T10:36:33.4007465Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7320 2022-12-01T10:36:33.4007833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4008005Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4008364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4008632Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4009005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4009179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4009545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4009730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4009952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4010176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4010429Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppwfow43k 2022-12-01T10:36:33.4010686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppwfow43k/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4010939Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6q6zv4c6 2022-12-01T10:36:33.4011205Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6q6zv4c6/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4011307Z ok (6.440s) 2022-12-01T10:36:33.4011327Z 2022-12-01T10:36:33.4011587Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4011700Z Ran 1 test in 6.440s 2022-12-01T10:36:33.4011719Z 2022-12-01T10:36:33.4011813Z OK 2022-12-01T10:36:33.4011832Z 2022-12-01T10:36:33.4011956Z Generating XML reports... 2022-12-01T10:36:33.4012396Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102815.xml 2022-12-01T10:36:33.4012765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4012942Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4013371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4013570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4013590Z 2022-12-01T10:36:33.4013698Z Running tests... 2022-12-01T10:36:33.4013961Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4014275Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4014559Z test_default_ddp_comm_hooks_nccl_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4014759Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7440 2022-12-01T10:36:33.4014970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7441 2022-12-01T10:36:33.4015344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4015520Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4015888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4016062Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4016436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4016624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4016983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4017172Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4017398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4017678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4017938Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1zhd4503 2022-12-01T10:36:33.4018206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1zhd4503/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4018459Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0gdih4_5 2022-12-01T10:36:33.4018723Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0gdih4_5/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4018824Z ok (6.445s) 2022-12-01T10:36:33.4018844Z 2022-12-01T10:36:33.4019097Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4019208Z Ran 1 test in 6.445s 2022-12-01T10:36:33.4019228Z 2022-12-01T10:36:33.4019319Z OK 2022-12-01T10:36:33.4019338Z 2022-12-01T10:36:33.4019457Z Generating XML reports... 2022-12-01T10:36:33.4019916Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102823.xml 2022-12-01T10:36:33.4020286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4020454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4020832Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4021004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4021039Z 2022-12-01T10:36:33.4021130Z Running tests... 2022-12-01T10:36:33.4021392Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4021751Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4022020Z test_failure_recovery (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4022245Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7561 2022-12-01T10:36:33.4022530Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7562 2022-12-01T10:36:33.4022920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4023095Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4023454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4023642Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4024008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4024182Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4024555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4024745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4024969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4025187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4025425Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp22go757i 2022-12-01T10:36:33.4025690Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp22go757i/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4025939Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptqxzgart 2022-12-01T10:36:33.4026206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptqxzgart/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4026438Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4026734Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4026970Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4027199Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4027284Z ok (7.126s) 2022-12-01T10:36:33.4027321Z 2022-12-01T10:36:33.4027575Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4027688Z Ran 1 test in 7.126s 2022-12-01T10:36:33.4027708Z 2022-12-01T10:36:33.4027800Z OK 2022-12-01T10:36:33.4027819Z 2022-12-01T10:36:33.4027942Z Generating XML reports... 2022-12-01T10:36:33.4028401Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102832.xml 2022-12-01T10:36:33.4028767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4028944Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4029323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4029495Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4029531Z 2022-12-01T10:36:33.4029623Z Running tests... 2022-12-01T10:36:33.4029882Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4030191Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4030489Z test_find_unused_parameters_kwarg_debug_detail (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4031243Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82632 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.527s) 2022-12-01T10:36:33.4031266Z 2022-12-01T10:36:33.4031582Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4031704Z Ran 1 test in 1.527s 2022-12-01T10:36:33.4031723Z 2022-12-01T10:36:33.4031826Z OK (skipped=1) 2022-12-01T10:36:33.4031845Z 2022-12-01T10:36:33.4031970Z Generating XML reports... 2022-12-01T10:36:33.4032412Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102841.xml 2022-12-01T10:36:33.4032781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4032956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4033335Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4033529Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4033548Z 2022-12-01T10:36:33.4033658Z Running tests... 2022-12-01T10:36:33.4033921Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4034234Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4034513Z test_find_unused_parameters_kwarg_debug_info (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4035256Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/83301 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.506s) 2022-12-01T10:36:33.4035295Z 2022-12-01T10:36:33.4035539Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4035765Z Ran 1 test in 1.506s 2022-12-01T10:36:33.4035785Z 2022-12-01T10:36:33.4035895Z OK (skipped=1) 2022-12-01T10:36:33.4035914Z 2022-12-01T10:36:33.4036041Z Generating XML reports... 2022-12-01T10:36:33.4036501Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102845.xml 2022-12-01T10:36:33.4036870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4037044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4037420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4037593Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4037630Z 2022-12-01T10:36:33.4037719Z Running tests... 2022-12-01T10:36:33.4037979Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4038296Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4038594Z test_find_unused_parameters_kwarg_debug_off (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4039335Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82385 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.529s) 2022-12-01T10:36:33.4039356Z 2022-12-01T10:36:33.4039614Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4039726Z Ran 1 test in 1.529s 2022-12-01T10:36:33.4039746Z 2022-12-01T10:36:33.4039854Z OK (skipped=1) 2022-12-01T10:36:33.4039873Z 2022-12-01T10:36:33.4039989Z Generating XML reports... 2022-12-01T10:36:33.4040430Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102849.xml 2022-12-01T10:36:33.4040858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4041038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4041416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4041608Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4041628Z 2022-12-01T10:36:33.4041737Z Running tests... 2022-12-01T10:36:33.4041997Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4042309Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4042827Z test_find_unused_parameters_kwarg_grad_is_view_debug_detail (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4043603Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82979 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.512s) 2022-12-01T10:36:33.4043642Z 2022-12-01T10:36:33.4043886Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4043999Z Ran 1 test in 1.512s 2022-12-01T10:36:33.4044018Z 2022-12-01T10:36:33.4044124Z OK (skipped=1) 2022-12-01T10:36:33.4044143Z 2022-12-01T10:36:33.4044266Z Generating XML reports... 2022-12-01T10:36:33.4044722Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102853.xml 2022-12-01T10:36:33.4045095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4045365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4045753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4045925Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4045963Z 2022-12-01T10:36:33.4046053Z Running tests... 2022-12-01T10:36:33.4046315Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4046623Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4046929Z test_find_unused_parameters_kwarg_grad_is_view_debug_info (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4047670Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82400 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.520s) 2022-12-01T10:36:33.4047694Z 2022-12-01T10:36:33.4047956Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4048067Z Ran 1 test in 1.520s 2022-12-01T10:36:33.4048087Z 2022-12-01T10:36:33.4048198Z OK (skipped=1) 2022-12-01T10:36:33.4048217Z 2022-12-01T10:36:33.4048336Z Generating XML reports... 2022-12-01T10:36:33.4048772Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102856.xml 2022-12-01T10:36:33.4049141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4049314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4049690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4049882Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4049901Z 2022-12-01T10:36:33.4050004Z Running tests... 2022-12-01T10:36:33.4050336Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4050666Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4050976Z test_find_unused_parameters_kwarg_grad_is_view_debug_off (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4051705Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82500 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.536s) 2022-12-01T10:36:33.4051742Z 2022-12-01T10:36:33.4051987Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4052105Z Ran 1 test in 1.536s 2022-12-01T10:36:33.4052125Z 2022-12-01T10:36:33.4052230Z OK (skipped=1) 2022-12-01T10:36:33.4052249Z 2022-12-01T10:36:33.4052368Z Generating XML reports... 2022-12-01T10:36:33.4052828Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102900.xml 2022-12-01T10:36:33.4053202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4053379Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4053755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4053927Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4053963Z 2022-12-01T10:36:33.4054053Z Running tests... 2022-12-01T10:36:33.4054314Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4054694Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4054938Z test_fp16 (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4055158Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7897 2022-12-01T10:36:33.4055376Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7898 2022-12-01T10:36:33.4055747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4055923Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4056282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4056467Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4056828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4057002Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4057376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4057564Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4057797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4058025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4058262Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp26thetfc 2022-12-01T10:36:33.4058530Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp26thetfc/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4058783Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpptxloxkt 2022-12-01T10:36:33.4059053Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpptxloxkt/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4059154Z ok (6.943s) 2022-12-01T10:36:33.4059174Z 2022-12-01T10:36:33.4059490Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4059608Z Ran 1 test in 6.943s 2022-12-01T10:36:33.4059627Z 2022-12-01T10:36:33.4059719Z OK 2022-12-01T10:36:33.4059738Z 2022-12-01T10:36:33.4059860Z Generating XML reports... 2022-12-01T10:36:33.4060304Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102904.xml 2022-12-01T10:36:33.4060675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4060847Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4061223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4061414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4061434Z 2022-12-01T10:36:33.4061538Z Running tests... 2022-12-01T10:36:33.4061799Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4062110Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4062376Z test_fp16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4062587Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8018 2022-12-01T10:36:33.4062796Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8019 2022-12-01T10:36:33.4063167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4063338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4063779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4063967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4064328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4064495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4064855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4065040Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4065266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4065806Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4066035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4066607Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4066863Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj6f0knz5 2022-12-01T10:36:33.4067135Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj6f0knz5/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4067389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc9aeaz2j 2022-12-01T10:36:33.4067659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc9aeaz2j/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4067745Z ok (6.548s) 2022-12-01T10:36:33.4067782Z 2022-12-01T10:36:33.4068102Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4068223Z Ran 1 test in 6.548s 2022-12-01T10:36:33.4068243Z 2022-12-01T10:36:33.4068339Z OK 2022-12-01T10:36:33.4068358Z 2022-12-01T10:36:33.4068483Z Generating XML reports... 2022-12-01T10:36:33.4068943Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102913.xml 2022-12-01T10:36:33.4069307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4069473Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4069844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4070022Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4070041Z 2022-12-01T10:36:33.4070147Z Running tests... 2022-12-01T10:36:33.4070409Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4070719Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4070994Z test_fp16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4071207Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8139 2022-12-01T10:36:33.4071419Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8140 2022-12-01T10:36:33.4071787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4071945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4072388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4072579Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4072949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4073119Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4073492Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4073677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4073903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4074444Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4074673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4075186Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4075442Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpshgvuvlu 2022-12-01T10:36:33.4075716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpshgvuvlu/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4075969Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuenlpbs5 2022-12-01T10:36:33.4076239Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuenlpbs5/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4076338Z ok (6.417s) 2022-12-01T10:36:33.4076415Z 2022-12-01T10:36:33.4076696Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4076814Z Ran 1 test in 6.417s 2022-12-01T10:36:33.4076833Z 2022-12-01T10:36:33.4076925Z OK 2022-12-01T10:36:33.4076943Z 2022-12-01T10:36:33.4077049Z Generating XML reports... 2022-12-01T10:36:33.4077505Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102922.xml 2022-12-01T10:36:33.4077879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4078049Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4078426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4078619Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4078639Z 2022-12-01T10:36:33.4078750Z Running tests... 2022-12-01T10:36:33.4079011Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4079318Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4079565Z test_fp16_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4079780Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8260 2022-12-01T10:36:33.4079988Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8261 2022-12-01T10:36:33.4080350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4080524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4080972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4081164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4081533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4081689Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4082061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4082245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4082697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4082934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4083191Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpenytskg4 2022-12-01T10:36:33.4083462Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpenytskg4/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4083714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0xr_n4ik 2022-12-01T10:36:33.4083979Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0xr_n4ik/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4084064Z ok (6.928s) 2022-12-01T10:36:33.4084083Z 2022-12-01T10:36:33.4084356Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4084465Z Ran 1 test in 6.928s 2022-12-01T10:36:33.4084485Z 2022-12-01T10:36:33.4084575Z OK 2022-12-01T10:36:33.4084594Z 2022-12-01T10:36:33.4084713Z Generating XML reports... 2022-12-01T10:36:33.4085168Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102931.xml 2022-12-01T10:36:33.4085538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4085791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4086168Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4086360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4086380Z 2022-12-01T10:36:33.4086488Z Running tests... 2022-12-01T10:36:33.4086751Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4087060Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4087372Z test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4087589Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8381 2022-12-01T10:36:33.4087803Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8382 2022-12-01T10:36:33.4088174Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4088333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4088714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4088900Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4089263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4089430Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4089801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4090073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4090301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4090508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4090760Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe7x_6r84 2022-12-01T10:36:33.4091023Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe7x_6r84/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4091268Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkca2yl9j 2022-12-01T10:36:33.4091533Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkca2yl9j/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4091765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4091996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4092230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4092461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4092674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4092897Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4093121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4093347Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4093570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4093789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4094010Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4094236Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4094496Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4094731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4094959Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4095181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4095403Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4095631Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4095854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4096078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4096307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4096510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4096743Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4096954Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4097174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4097392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4097613Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4097832Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4098114Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4098321Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4098540Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4098761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4098987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4099208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4099429Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4099648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4099748Z ok (8.564s) 2022-12-01T10:36:33.4099772Z 2022-12-01T10:36:33.4100025Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4100139Z Ran 1 test in 8.564s 2022-12-01T10:36:33.4100158Z 2022-12-01T10:36:33.4100252Z OK 2022-12-01T10:36:33.4100275Z 2022-12-01T10:36:33.4100402Z Generating XML reports... 2022-12-01T10:36:33.4100860Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102940.xml 2022-12-01T10:36:33.4101229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4101403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4101779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4101969Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4101988Z 2022-12-01T10:36:33.4102079Z Running tests... 2022-12-01T10:36:33.4102345Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4102655Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4102987Z test_grad_layout_2devicemodule (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4103215Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8502 2022-12-01T10:36:33.4103429Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8503 2022-12-01T10:36:33.4103800Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4103973Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4104332Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4104522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4104886Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4105061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4105435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4105622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4105848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4106073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4106221Z skip: Need at least 4 CUDA devices (3.743s) 2022-12-01T10:36:33.4106241Z 2022-12-01T10:36:33.4106485Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4106597Z Ran 1 test in 3.743s 2022-12-01T10:36:33.4106671Z 2022-12-01T10:36:33.4106785Z OK (skipped=1) 2022-12-01T10:36:33.4106805Z 2022-12-01T10:36:33.4106924Z Generating XML reports... 2022-12-01T10:36:33.4107383Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102951.xml 2022-12-01T10:36:33.4107753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4107927Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4108303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4108490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4108510Z 2022-12-01T10:36:33.4108601Z Running tests... 2022-12-01T10:36:33.4108864Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4109180Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4109456Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4109671Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8605 2022-12-01T10:36:33.4109883Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8606 2022-12-01T10:36:33.4110250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4110421Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4110776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4110965Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4111327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4111503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4111924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4112123Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4112348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4112897Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4113439Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4113979Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4114521Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4115058Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4115661Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4115892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4116429Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4116972Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4117497Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4118028Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4118614Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4119159Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4119263Z ok (3.735s) 2022-12-01T10:36:33.4119285Z 2022-12-01T10:36:33.4119543Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4119659Z Ran 1 test in 3.736s 2022-12-01T10:36:33.4119678Z 2022-12-01T10:36:33.4119772Z OK 2022-12-01T10:36:33.4119791Z 2022-12-01T10:36:33.4119919Z Generating XML reports... 2022-12-01T10:36:33.4120381Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102957.xml 2022-12-01T10:36:33.4120750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4120929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4121307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4121482Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4121518Z 2022-12-01T10:36:33.4121608Z Running tests... 2022-12-01T10:36:33.4121937Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4122252Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4122775Z test_multiple_outputs_multiple_backward (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4123000Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8708 2022-12-01T10:36:33.4123215Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8709 2022-12-01T10:36:33.4123598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4123772Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4124132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4124321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4124685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4124856Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4125229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4125418Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4125647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4125870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4126109Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa5u6t7yk 2022-12-01T10:36:33.4126377Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa5u6t7yk/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4126634Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv0_lyvre 2022-12-01T10:36:33.4126990Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv0_lyvre/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4127108Z ok (6.944s) 2022-12-01T10:36:33.4127128Z 2022-12-01T10:36:33.4127402Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4127515Z Ran 1 test in 6.944s 2022-12-01T10:36:33.4127535Z 2022-12-01T10:36:33.4127630Z OK 2022-12-01T10:36:33.4127649Z 2022-12-01T10:36:33.4127772Z Generating XML reports... 2022-12-01T10:36:33.4128216Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103003.xml 2022-12-01T10:36:33.4128585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4128758Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4129138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4129332Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4129356Z 2022-12-01T10:36:33.4129469Z Running tests... 2022-12-01T10:36:33.4129716Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4130026Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4130337Z test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4130554Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8829 2022-12-01T10:36:33.4130764Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8830 2022-12-01T10:36:33.4131130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4131392Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4131775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4131950Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4132308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4132478Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4132852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4133035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4133263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4133487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4133748Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi61xh0nh 2022-12-01T10:36:33.4134003Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi61xh0nh/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4134255Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyd7t7gn2 2022-12-01T10:36:33.4134520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyd7t7gn2/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4134619Z ok (6.928s) 2022-12-01T10:36:33.4134639Z 2022-12-01T10:36:33.4134902Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4135012Z Ran 1 test in 6.928s 2022-12-01T10:36:33.4135032Z 2022-12-01T10:36:33.4135121Z OK 2022-12-01T10:36:33.4135140Z 2022-12-01T10:36:33.4135264Z Generating XML reports... 2022-12-01T10:36:33.4135719Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103012.xml 2022-12-01T10:36:33.4136074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4136299Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4136688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4136874Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4136894Z 2022-12-01T10:36:33.4136999Z Running tests... 2022-12-01T10:36:33.4137254Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4137560Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4137862Z test_nccl_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4138081Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8950 2022-12-01T10:36:33.4138277Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8951 2022-12-01T10:36:33.4138648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4138818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4139192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4139379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4139737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4139904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4140273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4140504Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4140733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4140958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4141210Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1csm_0_l 2022-12-01T10:36:33.4141473Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1csm_0_l/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4141724Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuxyxbu4l 2022-12-01T10:36:33.4141987Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuxyxbu4l/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4142217Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4142455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4142540Z ok (6.924s) 2022-12-01T10:36:33.4142561Z 2022-12-01T10:36:33.4142835Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4142946Z Ran 1 test in 6.924s 2022-12-01T10:36:33.4142966Z 2022-12-01T10:36:33.4143054Z OK 2022-12-01T10:36:33.4143073Z 2022-12-01T10:36:33.4143190Z Generating XML reports... 2022-12-01T10:36:33.4143646Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103021.xml 2022-12-01T10:36:33.4144015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4144186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4144543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4144736Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4144756Z 2022-12-01T10:36:33.4144861Z Running tests... 2022-12-01T10:36:33.4145173Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4145494Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4145804Z test_nccl_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4146021Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9071 2022-12-01T10:36:33.4146233Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9072 2022-12-01T10:36:33.4146598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4146755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4147132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4147327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4147688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4147858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4148222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4148405Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4148629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4148832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4149087Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwqg8bh9o 2022-12-01T10:36:33.4149413Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwqg8bh9o/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4149668Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb8cskapy 2022-12-01T10:36:33.4149931Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb8cskapy/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4150164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4150396Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4150496Z ok (6.925s) 2022-12-01T10:36:33.4150517Z 2022-12-01T10:36:33.4150787Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4150882Z Ran 1 test in 6.925s 2022-12-01T10:36:33.4150902Z 2022-12-01T10:36:33.4150990Z OK 2022-12-01T10:36:33.4151009Z 2022-12-01T10:36:33.4151132Z Generating XML reports... 2022-12-01T10:36:33.4151590Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103030.xml 2022-12-01T10:36:33.4151962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4152135Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4152509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4152697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4152716Z 2022-12-01T10:36:33.4152808Z Running tests... 2022-12-01T10:36:33.4153064Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4153372Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4153649Z test_nccl_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4153865Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9192 2022-12-01T10:36:33.4154137Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9193 2022-12-01T10:36:33.4154521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4154693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4155049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4155238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4155602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4155775Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4156149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4156336Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4156564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4156789Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4156936Z skip: Need at least 4 CUDA devices (3.745s) 2022-12-01T10:36:33.4156955Z 2022-12-01T10:36:33.4157198Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4157311Z Ran 1 test in 3.745s 2022-12-01T10:36:33.4157331Z 2022-12-01T10:36:33.4157436Z OK (skipped=1) 2022-12-01T10:36:33.4157455Z 2022-12-01T10:36:33.4157574Z Generating XML reports... 2022-12-01T10:36:33.4158031Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103039.xml 2022-12-01T10:36:33.4158467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4158644Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4159021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4159210Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4159230Z 2022-12-01T10:36:33.4159320Z Running tests... 2022-12-01T10:36:33.4159584Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4159891Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4160166Z test_nccl_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4160383Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9295 2022-12-01T10:36:33.4160602Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9296 2022-12-01T10:36:33.4160973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4161146Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4161505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4161694Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4162058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4162227Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4162832Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4163034Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4163339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4163574Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4163704Z skip: Need at least 8 CUDA devices (3.740s) 2022-12-01T10:36:33.4163740Z 2022-12-01T10:36:33.4163993Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4164104Z Ran 1 test in 3.740s 2022-12-01T10:36:33.4164123Z 2022-12-01T10:36:33.4164229Z OK (skipped=1) 2022-12-01T10:36:33.4164248Z 2022-12-01T10:36:33.4164370Z Generating XML reports... 2022-12-01T10:36:33.4164830Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103045.xml 2022-12-01T10:36:33.4165197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4165375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4165754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4165928Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4165963Z 2022-12-01T10:36:33.4166054Z Running tests... 2022-12-01T10:36:33.4166317Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4166667Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4166959Z test_nccl_backend_multi_device_ids_not_allowed (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4167173Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9398 2022-12-01T10:36:33.4167385Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9399 2022-12-01T10:36:33.4167843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4168020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4168381Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4168570Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4168932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4169104Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4169479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4169664Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4169898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4170128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4170213Z ok (5.237s) 2022-12-01T10:36:33.4170234Z 2022-12-01T10:36:33.4170494Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4170606Z Ran 1 test in 5.238s 2022-12-01T10:36:33.4170626Z 2022-12-01T10:36:33.4170717Z OK 2022-12-01T10:36:33.4170737Z 2022-12-01T10:36:33.4170858Z Generating XML reports... 2022-12-01T10:36:33.4171313Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103051.xml 2022-12-01T10:36:33.4171678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4171850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4172228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4172402Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4172530Z 2022-12-01T10:36:33.4172653Z Running tests... 2022-12-01T10:36:33.4172915Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4173223Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4173527Z test_nccl_backend_multi_device_module_device_ids_None (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4173744Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9507 2022-12-01T10:36:33.4173957Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9508 2022-12-01T10:36:33.4174319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4174481Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4174860Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4175048Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4175410Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4175582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4175952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4176139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4176365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4176592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4176788Z skip: Need at least 4 CUDA devices (3.730s) 2022-12-01T10:36:33.4176808Z 2022-12-01T10:36:33.4177076Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4177187Z Ran 1 test in 3.730s 2022-12-01T10:36:33.4177206Z 2022-12-01T10:36:33.4177312Z OK (skipped=1) 2022-12-01T10:36:33.4177331Z 2022-12-01T10:36:33.4177451Z Generating XML reports... 2022-12-01T10:36:33.4177906Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103059.xml 2022-12-01T10:36:33.4178272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4178445Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4178801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4178995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4179014Z 2022-12-01T10:36:33.4179122Z Running tests... 2022-12-01T10:36:33.4179389Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4179701Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4180008Z test_nccl_backend_single_device_module_device_ids_None (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4180222Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9610 2022-12-01T10:36:33.4180433Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9611 2022-12-01T10:36:33.4180801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4180960Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4181341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4181582Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4181961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4182131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4182501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4182686Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4182910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4183114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4183368Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps_q1pa5r 2022-12-01T10:36:33.4183641Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps_q1pa5r/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4183897Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr_du5vgr 2022-12-01T10:36:33.4184162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr_du5vgr/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4184393Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4184624Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4184724Z ok (6.928s) 2022-12-01T10:36:33.4184744Z 2022-12-01T10:36:33.4185005Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4185102Z Ran 1 test in 6.928s 2022-12-01T10:36:33.4185121Z 2022-12-01T10:36:33.4185210Z OK 2022-12-01T10:36:33.4185229Z 2022-12-01T10:36:33.4185413Z Generating XML reports... 2022-12-01T10:36:33.4185872Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103105.xml 2022-12-01T10:36:33.4186244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4186417Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4186793Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4186981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4187000Z 2022-12-01T10:36:33.4187090Z Running tests... 2022-12-01T10:36:33.4187351Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4187663Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4187973Z test_nccl_backend_single_device_module_empty_device_ids (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4188193Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9731 2022-12-01T10:36:33.4188407Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9732 2022-12-01T10:36:33.4188772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4188943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4189306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4189463Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4189838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4190025Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4190406Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4190661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4190901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4191126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4191380Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmzbm1xrk 2022-12-01T10:36:33.4191633Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmzbm1xrk/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4191886Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq7ys62b2 2022-12-01T10:36:33.4192145Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq7ys62b2/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4192384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4192621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4192724Z ok (6.918s) 2022-12-01T10:36:33.4192744Z 2022-12-01T10:36:33.4193008Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4193120Z Ran 1 test in 6.918s 2022-12-01T10:36:33.4193140Z 2022-12-01T10:36:33.4193231Z OK 2022-12-01T10:36:33.4193250Z 2022-12-01T10:36:33.4193355Z Generating XML reports... 2022-12-01T10:36:33.4193808Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103114.xml 2022-12-01T10:36:33.4194176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4194348Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4194787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4194976Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4195000Z 2022-12-01T10:36:33.4195106Z Running tests... 2022-12-01T10:36:33.4195362Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4195654Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4195931Z test_nccl_propagate_error_reason (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4196144Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9852 2022-12-01T10:36:33.4196355Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9853 2022-12-01T10:36:33.4196717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4196891Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4197270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4197459Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4197820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4197976Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4198341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4198526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4198748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4198973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4199080Z ok (22.378s) 2022-12-01T10:36:33.4199100Z 2022-12-01T10:36:33.4199363Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4199528Z Ran 1 test in 22.378s 2022-12-01T10:36:33.4199551Z 2022-12-01T10:36:33.4199630Z OK 2022-12-01T10:36:33.4199665Z 2022-12-01T10:36:33.4199774Z Generating XML reports... 2022-12-01T10:36:33.4200235Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103123.xml 2022-12-01T10:36:33.4200598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4200769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4201135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4201322Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4201345Z 2022-12-01T10:36:33.4201453Z Running tests... 2022-12-01T10:36:33.4201712Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4202010Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4202192Z test_no_grad (__main__.DistributedDataParallelTest) 2022-12-01T10:36:33.4202611Z Note: this test can be sped up by only running it on a CPU module ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4202836Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9973 2022-12-01T10:36:33.4203052Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9974 2022-12-01T10:36:33.4203421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4203596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4204069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4204245Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4204602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4204774Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4205146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4205329Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4205554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4205780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4206035Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkcdsbtqy 2022-12-01T10:36:33.4206309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkcdsbtqy/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4206546Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2396zru8 2022-12-01T10:36:33.4206808Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2396zru8/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4206907Z ok (6.942s) 2022-12-01T10:36:33.4206929Z 2022-12-01T10:36:33.4207194Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4207305Z Ran 1 test in 6.942s 2022-12-01T10:36:33.4207324Z 2022-12-01T10:36:33.4207415Z OK 2022-12-01T10:36:33.4207433Z 2022-12-01T10:36:33.4207552Z Generating XML reports... 2022-12-01T10:36:33.4208005Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103148.xml 2022-12-01T10:36:33.4208355Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4208532Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4208973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4209172Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4209194Z 2022-12-01T10:36:33.4209300Z Running tests... 2022-12-01T10:36:33.4209560Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4209867Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4210141Z test_param_layout_mismatch_error (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4210355Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10090 2022-12-01T10:36:33.4210556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10091 2022-12-01T10:36:33.4210926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4211103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4211479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4211667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4212030Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4212201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4212567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4212735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4213021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4213251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4213503Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3g65_9fc 2022-12-01T10:36:33.4213767Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3g65_9fc/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4214018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxux7xs1b 2022-12-01T10:36:33.4214284Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxux7xs1b/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4214386Z ok (6.426s) 2022-12-01T10:36:33.4214406Z 2022-12-01T10:36:33.4214674Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4214773Z Ran 1 test in 6.426s 2022-12-01T10:36:33.4214793Z 2022-12-01T10:36:33.4214887Z OK 2022-12-01T10:36:33.4214906Z 2022-12-01T10:36:33.4215030Z Generating XML reports... 2022-12-01T10:36:33.4215493Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103157.xml 2022-12-01T10:36:33.4215858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4216029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4216402Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4216589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4216608Z 2022-12-01T10:36:33.4216698Z Running tests... 2022-12-01T10:36:33.4216961Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4217272Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4217540Z test_pass_default_pg (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4217816Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10207 2022-12-01T10:36:33.4218045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10208 2022-12-01T10:36:33.4218416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4218588Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4218948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4219136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4219499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4219674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4220046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4220236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4220461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4220704Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4220929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4221147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4221545Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4221938Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4222102Z ok (3.824s) 2022-12-01T10:36:33.4222122Z 2022-12-01T10:36:33.4222393Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4222505Z Ran 1 test in 3.824s 2022-12-01T10:36:33.4222525Z 2022-12-01T10:36:33.4222616Z OK 2022-12-01T10:36:33.4222635Z 2022-12-01T10:36:33.4222756Z Generating XML reports... 2022-12-01T10:36:33.4223212Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103206.xml 2022-12-01T10:36:33.4223561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4223732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4224108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4224299Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4224319Z 2022-12-01T10:36:33.4224427Z Running tests... 2022-12-01T10:36:33.4224688Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4224997Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4225277Z test_powerSGD_ddp_comm_hook_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4225478Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10314 2022-12-01T10:36:33.4225694Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10315 2022-12-01T10:36:33.4226061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4226233Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4226612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4226801Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4227217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4227401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4227775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4227946Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4228172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4228720Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4228953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4229532Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4229941Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1fxt26rl 2022-12-01T10:36:33.4230251Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1fxt26rl/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4230553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzk02oly5 2022-12-01T10:36:33.4230921Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzk02oly5/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4231508Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4232087Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4232664Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4233253Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4233937Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4234553Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4235136Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4235699Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4236267Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4236826Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4237396Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4238031Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4238647Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4239221Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4239368Z ok (6.544s) 2022-12-01T10:36:33.4239389Z 2022-12-01T10:36:33.4239652Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4239805Z Ran 1 test in 6.544s 2022-12-01T10:36:33.4239825Z 2022-12-01T10:36:33.4239951Z OK 2022-12-01T10:36:33.4239971Z 2022-12-01T10:36:33.4240144Z Generating XML reports... 2022-12-01T10:36:33.4240689Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103212.xml 2022-12-01T10:36:33.4241102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4241351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4241784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4241962Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4242089Z 2022-12-01T10:36:33.4242191Z Running tests... 2022-12-01T10:36:33.4242683Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4243066Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4243400Z test_powerSGD_ddp_comm_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4243653Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10435 2022-12-01T10:36:33.4243904Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10436 2022-12-01T10:36:33.4244360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4244582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4244953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4245183Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4245595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4245803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4246262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4246487Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4246755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4247374Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4247749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4248331Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4248575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzn8ghnro 2022-12-01T10:36:33.4248879Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzn8ghnro/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4249171Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdj8eq5hu 2022-12-01T10:36:33.4249475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdj8eq5hu/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4250050Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4250625Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4251310Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4251909Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4252480Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4253045Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4253677Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4254242Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4254875Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4255440Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4256056Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4256629Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-12-01T10:36:33.4257211Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4257835Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T10:36:33.4257985Z ok (6.444s) 2022-12-01T10:36:33.4258007Z 2022-12-01T10:36:33.4258334Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4258484Z Ran 1 test in 6.444s 2022-12-01T10:36:33.4258504Z 2022-12-01T10:36:33.4258632Z OK 2022-12-01T10:36:33.4258652Z 2022-12-01T10:36:33.4258844Z Generating XML reports... 2022-12-01T10:36:33.4259351Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103220.xml 2022-12-01T10:36:33.4259710Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4259967Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4260386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4260625Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4260646Z 2022-12-01T10:36:33.4260789Z Running tests... 2022-12-01T10:36:33.4261091Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4261442Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4261789Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4261995Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10556 2022-12-01T10:36:33.4262250Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10557 2022-12-01T10:36:33.4262740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4262952Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4263376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4263605Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4264006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4264218Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4264662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4264841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4265161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4265424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4265717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmu6di_gl 2022-12-01T10:36:33.4266019Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmu6di_gl/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4266310Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkb_0qhaa 2022-12-01T10:36:33.4266653Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkb_0qhaa/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4266926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4267142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4267327Z ok (7.938s) 2022-12-01T10:36:33.4267347Z 2022-12-01T10:36:33.4267652Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4267805Z Ran 1 test in 7.938s 2022-12-01T10:36:33.4267825Z 2022-12-01T10:36:33.4267951Z OK 2022-12-01T10:36:33.4267970Z 2022-12-01T10:36:33.4268181Z Generating XML reports... 2022-12-01T10:36:33.4296011Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103229.xml 2022-12-01T10:36:33.4296493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4296671Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4297051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4297232Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4297254Z 2022-12-01T10:36:33.4297353Z Running tests... 2022-12-01T10:36:33.4297612Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4297925Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4298206Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4298418Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10677 2022-12-01T10:36:33.4298619Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10678 2022-12-01T10:36:33.4298983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4299147Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4299514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4299693Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4300221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4300388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4300759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4300931Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4301150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4301365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4301615Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu04kqxpg 2022-12-01T10:36:33.4301876Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu04kqxpg/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4302121Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg6fjcckp 2022-12-01T10:36:33.4302387Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg6fjcckp/_remote_module_non_scriptable.py 2022-12-01T10:36:33.4302614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4302839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:36:33.4302926Z ok (7.141s) 2022-12-01T10:36:33.4302947Z 2022-12-01T10:36:33.4303207Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4303309Z Ran 1 test in 7.141s 2022-12-01T10:36:33.4303329Z 2022-12-01T10:36:33.4303409Z OK 2022-12-01T10:36:33.4303428Z 2022-12-01T10:36:33.4303540Z Generating XML reports... 2022-12-01T10:36:33.4303995Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103239.xml 2022-12-01T10:36:33.4304356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4304526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4304966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4305163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4305183Z 2022-12-01T10:36:33.4305280Z Running tests... 2022-12-01T10:36:33.4305535Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4305834Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4306091Z test_invalid_nccl_blocking_wait_env (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4306298Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10798 2022-12-01T10:36:33.4306504Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10799 2022-12-01T10:36:33.4306715Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10800 2022-12-01T10:36:33.4307071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4307237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4307606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4307785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4308141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4308304Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4308671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4308915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4309271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4309439Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4309817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4309995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4310210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4310429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4310652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4310798Z skip: Need at least 3 CUDA devices (3.827s) 2022-12-01T10:36:33.4310822Z 2022-12-01T10:36:33.4311086Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4311180Z Ran 1 test in 3.827s 2022-12-01T10:36:33.4311200Z 2022-12-01T10:36:33.4311306Z OK (skipped=1) 2022-12-01T10:36:33.4311326Z 2022-12-01T10:36:33.4311438Z Generating XML reports... 2022-12-01T10:36:33.4311860Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103249.xml 2022-12-01T10:36:33.4312227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4312391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4312764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4312948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4312971Z 2022-12-01T10:36:33.4313064Z Running tests... 2022-12-01T10:36:33.4313336Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4313717Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4313998Z test_nccl_blocking_wait_with_barrier (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4314218Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10935 2022-12-01T10:36:33.4314434Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10936 2022-12-01T10:36:33.4314650Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10937 2022-12-01T10:36:33.4315024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4315201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4315562Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4315757Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4316127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4316303Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4316680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4316870Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4317235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4317410Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4317766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4318019Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4318254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4318485Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4318711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4318862Z skip: Need at least 3 CUDA devices (3.802s) 2022-12-01T10:36:33.4318882Z 2022-12-01T10:36:33.4319151Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4319263Z Ran 1 test in 3.802s 2022-12-01T10:36:33.4319283Z 2022-12-01T10:36:33.4319390Z OK (skipped=1) 2022-12-01T10:36:33.4319409Z 2022-12-01T10:36:33.4319515Z Generating XML reports... 2022-12-01T10:36:33.4319952Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103255.xml 2022-12-01T10:36:33.4320327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4320506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4320882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4321073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4321093Z 2022-12-01T10:36:33.4321202Z Running tests... 2022-12-01T10:36:33.4321467Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4321817Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4322155Z test_nccl_errors_blocking_abort (__main__.NcclErrorHandlingTest) ... skip: Frequently times out see https://github.com/pytorch/pytorch/issues/58920 (0.000s) 2022-12-01T10:36:33.4322179Z 2022-12-01T10:36:33.4322697Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4322823Z Ran 1 test in 0.001s 2022-12-01T10:36:33.4322843Z 2022-12-01T10:36:33.4323037Z OK (skipped=1) 2022-12-01T10:36:33.4323061Z 2022-12-01T10:36:33.4323196Z Generating XML reports... 2022-12-01T10:36:33.4323644Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103301.xml 2022-12-01T10:36:33.4324011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4324188Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4324545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4324736Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4324755Z 2022-12-01T10:36:33.4324870Z Running tests... 2022-12-01T10:36:33.4325136Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4325449Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4325720Z test_nccl_errors_blocking_clean_exit (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4325940Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11105 2022-12-01T10:36:33.4326160Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11106 2022-12-01T10:36:33.4326374Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11107 2022-12-01T10:36:33.4326730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4326904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4327284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4327612Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4327985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4328162Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4328538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4328728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4329075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4329251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4329626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4329819Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4330051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4330280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4330505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4330653Z skip: Need at least 3 CUDA devices (3.832s) 2022-12-01T10:36:33.4330674Z 2022-12-01T10:36:33.4330940Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4331036Z Ran 1 test in 3.832s 2022-12-01T10:36:33.4331056Z 2022-12-01T10:36:33.4331163Z OK (skipped=1) 2022-12-01T10:36:33.4331182Z 2022-12-01T10:36:33.4331305Z Generating XML reports... 2022-12-01T10:36:33.4331739Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103303.xml 2022-12-01T10:36:33.4332112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4332342Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4332737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4332929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4332948Z 2022-12-01T10:36:33.4333057Z Running tests... 2022-12-01T10:36:33.4333303Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4333613Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4333888Z test_nccl_errors_blocking_nonzero_exit (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4334104Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11242 2022-12-01T10:36:33.4334324Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11243 2022-12-01T10:36:33.4334540Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11244 2022-12-01T10:36:33.4334914Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4335087Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4335448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4335638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4336004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4336179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4336619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4336809Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4337179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4337356Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4337716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4337906Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4338136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4338363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4338588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4338696Z ok (3.809s) 2022-12-01T10:36:33.4338715Z 2022-12-01T10:36:33.4338982Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4339100Z Ran 1 test in 3.810s 2022-12-01T10:36:33.4339120Z 2022-12-01T10:36:33.4339213Z OK 2022-12-01T10:36:33.4339232Z 2022-12-01T10:36:33.4339339Z Generating XML reports... 2022-12-01T10:36:33.4339774Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103309.xml 2022-12-01T10:36:33.4340144Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4340320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4340696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4340887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4340910Z 2022-12-01T10:36:33.4341020Z Running tests... 2022-12-01T10:36:33.4341283Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4341632Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4341913Z test_nccl_errors_blocking_sigkill (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4342132Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11379 2022-12-01T10:36:33.4342349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11380 2022-12-01T10:36:33.4342563Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11381 2022-12-01T10:36:33.4342937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4343115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4343499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4343693Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4344043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4344216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4344592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4344781Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4345146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4345320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4345697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4345947Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4346161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4346388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4346616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4346717Z ok (3.939s) 2022-12-01T10:36:33.4346737Z 2022-12-01T10:36:33.4347005Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4347118Z Ran 1 test in 3.939s 2022-12-01T10:36:33.4347138Z 2022-12-01T10:36:33.4347231Z OK 2022-12-01T10:36:33.4347250Z 2022-12-01T10:36:33.4347375Z Generating XML reports... 2022-12-01T10:36:33.4347790Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103315.xml 2022-12-01T10:36:33.4348162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4348339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4348713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4348902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4348922Z 2022-12-01T10:36:33.4349031Z Running tests... 2022-12-01T10:36:33.4349293Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4349603Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4349869Z test_nccl_errors_blocking_sigterm (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4350068Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11516 2022-12-01T10:36:33.4350287Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11517 2022-12-01T10:36:33.4350551Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11518 2022-12-01T10:36:33.4350937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4351111Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4351485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4351680Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4352046Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4352201Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4352578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4352772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4353139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4353313Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4353690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4353880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4354109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4354336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4354544Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4354709Z ok (3.839s) 2022-12-01T10:36:33.4354729Z 2022-12-01T10:36:33.4355007Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4355121Z Ran 1 test in 3.839s 2022-12-01T10:36:33.4355141Z 2022-12-01T10:36:33.4355235Z OK 2022-12-01T10:36:33.4355254Z 2022-12-01T10:36:33.4355380Z Generating XML reports... 2022-12-01T10:36:33.4355817Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103321.xml 2022-12-01T10:36:33.4356187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4356344Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4356720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4356912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4356935Z 2022-12-01T10:36:33.4357046Z Running tests... 2022-12-01T10:36:33.4357307Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4357624Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4357892Z test_nccl_errors_nonblocking (__main__.NcclErrorHandlingTest) ... skip: Test does not pass when run locally (0.001s) 2022-12-01T10:36:33.4357911Z 2022-12-01T10:36:33.4358173Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4358287Z Ran 1 test in 0.001s 2022-12-01T10:36:33.4358306Z 2022-12-01T10:36:33.4358395Z OK (skipped=1) 2022-12-01T10:36:33.4358414Z 2022-12-01T10:36:33.4358538Z Generating XML reports... 2022-12-01T10:36:33.4358969Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103327.xml 2022-12-01T10:36:33.4359336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4359514Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4359945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4360148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4360167Z 2022-12-01T10:36:33.4360277Z Running tests... 2022-12-01T10:36:33.4360545Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4360838Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4361078Z test_nccl_timeout (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4361297Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11686 2022-12-01T10:36:33.4361513Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11687 2022-12-01T10:36:33.4361731Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11688 2022-12-01T10:36:33.4362108Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4362285Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4363016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4363192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4363570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4363743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4364115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4364403Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4364776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4364951Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4365327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4365518Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4365730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4365957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:36:33.4366183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4366333Z skip: Need at least 3 CUDA devices (3.844s) 2022-12-01T10:36:33.4366357Z 2022-12-01T10:36:33.4366664Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4366780Z Ran 1 test in 3.845s 2022-12-01T10:36:33.4366799Z 2022-12-01T10:36:33.4366912Z OK (skipped=1) 2022-12-01T10:36:33.4366932Z 2022-12-01T10:36:33.4367055Z Generating XML reports... 2022-12-01T10:36:33.4367472Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103330.xml 2022-12-01T10:36:33.4367840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4368014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4368386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4368577Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4368600Z 2022-12-01T10:36:33.4368710Z Running tests... 2022-12-01T10:36:33.4368974Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4369357Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4369704Z test_collectives (__main__.NcclProcessGroupWithDispatchedCollectivesTests) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4369906Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11823 2022-12-01T10:36:33.4370278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4370457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4370832Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4371021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4371253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4371503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4371907Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4371993Z ok (5.031s) 2022-12-01T10:36:33.4372014Z 2022-12-01T10:36:33.4372281Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4372398Z Ran 1 test in 5.031s 2022-12-01T10:36:33.4372418Z 2022-12-01T10:36:33.4372512Z OK 2022-12-01T10:36:33.4372531Z 2022-12-01T10:36:33.4372655Z Generating XML reports... 2022-12-01T10:36:33.4373213Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclProcessGroupWithDispatchedCollectivesTests-20221201103336.xml 2022-12-01T10:36:33.4373580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4373821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4374207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4374381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4374401Z 2022-12-01T10:36:33.4374509Z Running tests... 2022-12-01T10:36:33.4374773Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4375081Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4375345Z test_init_no_gpus (__main__.ProcessGroupNCCLNoGPUTest) ... skip: GPUs are available, skipping test (0.001s) 2022-12-01T10:36:33.4375365Z 2022-12-01T10:36:33.4375623Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4375736Z Ran 1 test in 0.001s 2022-12-01T10:36:33.4375759Z 2022-12-01T10:36:33.4375867Z OK (skipped=1) 2022-12-01T10:36:33.4375886Z 2022-12-01T10:36:33.4376010Z Generating XML reports... 2022-12-01T10:36:33.4376451Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLNoGPUTest-20221201103343.xml 2022-12-01T10:36:33.4376819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4376996Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4377371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4377561Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4377581Z 2022-12-01T10:36:33.4377689Z Running tests... 2022-12-01T10:36:33.4377948Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4378258Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4378507Z test_allgather_base_basics (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4378792Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11931 2022-12-01T10:36:33.4379025Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11932 2022-12-01T10:36:33.4379396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4379571Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4379949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4380140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4380511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4380693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4381055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4381246Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4381477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4381722Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4381946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4382184Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4382588Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4383053Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4383138Z ok (5.342s) 2022-12-01T10:36:33.4383176Z 2022-12-01T10:36:33.4383424Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4383537Z Ran 1 test in 5.342s 2022-12-01T10:36:33.4383557Z 2022-12-01T10:36:33.4383650Z OK 2022-12-01T10:36:33.4383670Z 2022-12-01T10:36:33.4383794Z Generating XML reports... 2022-12-01T10:36:33.4384225Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103345.xml 2022-12-01T10:36:33.4384592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4384768Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4385148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4385325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4385344Z 2022-12-01T10:36:33.4385453Z Running tests... 2022-12-01T10:36:33.4385719Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4386032Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4386287Z test_allgather_base_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4386506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12040 2022-12-01T10:36:33.4386724Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12041 2022-12-01T10:36:33.4387091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4387248Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4387630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4387818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4388233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4388417Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4388795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4388985Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4389213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4389458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4389664Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4389907Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4390313Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4390711Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4390815Z ok (6.337s) 2022-12-01T10:36:33.4390836Z 2022-12-01T10:36:33.4391100Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4391212Z Ran 1 test in 6.337s 2022-12-01T10:36:33.4391231Z 2022-12-01T10:36:33.4391324Z OK 2022-12-01T10:36:33.4391343Z 2022-12-01T10:36:33.4391449Z Generating XML reports... 2022-12-01T10:36:33.4391885Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103353.xml 2022-12-01T10:36:33.4392323Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4392500Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4392881Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4393073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4393092Z 2022-12-01T10:36:33.4393202Z Running tests... 2022-12-01T10:36:33.4393462Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4393775Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4394001Z test_allgather_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4394221Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12157 2022-12-01T10:36:33.4394442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12158 2022-12-01T10:36:33.4394816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4394990Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4395370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4395559Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4395928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4396084Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4396460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4396651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4396884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4397180Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4397422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4397661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4398066Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4398460Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4398544Z ok (6.439s) 2022-12-01T10:36:33.4398564Z 2022-12-01T10:36:33.4398825Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4398941Z Ran 1 test in 6.439s 2022-12-01T10:36:33.4398961Z 2022-12-01T10:36:33.4399055Z OK 2022-12-01T10:36:33.4399075Z 2022-12-01T10:36:33.4399199Z Generating XML reports... 2022-12-01T10:36:33.4399635Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103401.xml 2022-12-01T10:36:33.4400006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4400182Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4400559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4400732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4400752Z 2022-12-01T10:36:33.4400861Z Running tests... 2022-12-01T10:36:33.4401122Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4401500Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4401745Z test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4401966Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12274 2022-12-01T10:36:33.4402185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12275 2022-12-01T10:36:33.4402747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4402912Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4403295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4403485Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4403852Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4404033Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4404414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4404604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4404833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4405060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4405284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4405525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4405930Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4406332Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4406434Z ok (6.435s) 2022-12-01T10:36:33.4406541Z 2022-12-01T10:36:33.4406825Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4406938Z Ran 1 test in 6.435s 2022-12-01T10:36:33.4406958Z 2022-12-01T10:36:33.4407051Z OK 2022-12-01T10:36:33.4407070Z 2022-12-01T10:36:33.4407177Z Generating XML reports... 2022-12-01T10:36:33.4407610Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103410.xml 2022-12-01T10:36:33.4407979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4408153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4408533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4408726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4408746Z 2022-12-01T10:36:33.4408858Z Running tests... 2022-12-01T10:36:33.4409118Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4409431Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4409649Z test_barrier (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4409869Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12391 2022-12-01T10:36:33.4410085Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12392 2022-12-01T10:36:33.4410454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4410629Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4411091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4411286Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4411656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4411813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4412192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4412381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4412609Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4412855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4413083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4413323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4413727Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4414128Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4414215Z ok (6.440s) 2022-12-01T10:36:33.4414234Z 2022-12-01T10:36:33.4414498Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4414612Z Ran 1 test in 6.440s 2022-12-01T10:36:33.4414631Z 2022-12-01T10:36:33.4414724Z OK 2022-12-01T10:36:33.4414743Z 2022-12-01T10:36:33.4414866Z Generating XML reports... 2022-12-01T10:36:33.4415297Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103419.xml 2022-12-01T10:36:33.4415670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4415895Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4416272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4416464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4416484Z 2022-12-01T10:36:33.4416591Z Running tests... 2022-12-01T10:36:33.4416852Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4417165Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4417411Z test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4417630Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12508 2022-12-01T10:36:33.4417852Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12509 2022-12-01T10:36:33.4418226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4418384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4418762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4418953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4419318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4419491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4419867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4420118Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4420348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4420576Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4420803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4421044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4421452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4421847Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4421950Z ok (6.412s) 2022-12-01T10:36:33.4421969Z 2022-12-01T10:36:33.4422233Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4422351Z Ran 1 test in 6.412s 2022-12-01T10:36:33.4422371Z 2022-12-01T10:36:33.4422465Z OK 2022-12-01T10:36:33.4422484Z 2022-12-01T10:36:33.4422590Z Generating XML reports... 2022-12-01T10:36:33.4423029Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103427.xml 2022-12-01T10:36:33.4423401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4423578Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4423953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4424145Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4424165Z 2022-12-01T10:36:33.4424276Z Running tests... 2022-12-01T10:36:33.4424536Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4424835Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4425130Z test_empty_tensors (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4425359Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12625 2022-12-01T10:36:33.4425574Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12626 2022-12-01T10:36:33.4425948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4426122Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4426500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4426688Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4427053Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4427212Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4427594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4427782Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4428009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4428253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4428477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4428715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4429116Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4429591Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4429681Z ok (6.424s) 2022-12-01T10:36:33.4429701Z 2022-12-01T10:36:33.4429966Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4430079Z Ran 1 test in 6.424s 2022-12-01T10:36:33.4430099Z 2022-12-01T10:36:33.4430192Z OK 2022-12-01T10:36:33.4430211Z 2022-12-01T10:36:33.4430335Z Generating XML reports... 2022-12-01T10:36:33.4430767Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103436.xml 2022-12-01T10:36:33.4431137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4431314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4431670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4431863Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4431883Z 2022-12-01T10:36:33.4431994Z Running tests... 2022-12-01T10:36:33.4432260Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4432575Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4432820Z test_gather_checks (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4433038Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12742 2022-12-01T10:36:33.4433256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12743 2022-12-01T10:36:33.4433628Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4433787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4434166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4434407Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4434789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4434964Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4435343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4435532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4435762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4435988Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4436214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4436453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4436855Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4437254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4437359Z ok (5.248s) 2022-12-01T10:36:33.4437379Z 2022-12-01T10:36:33.4437644Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4437756Z Ran 1 test in 5.248s 2022-12-01T10:36:33.4437775Z 2022-12-01T10:36:33.4437867Z OK 2022-12-01T10:36:33.4437886Z 2022-12-01T10:36:33.4437993Z Generating XML reports... 2022-12-01T10:36:33.4438428Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103445.xml 2022-12-01T10:36:33.4438858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4439037Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4439416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4439606Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4439626Z 2022-12-01T10:36:33.4439736Z Running tests... 2022-12-01T10:36:33.4439997Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4440290Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4440527Z test_gather_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4440745Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12851 2022-12-01T10:36:33.4440964Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12852 2022-12-01T10:36:33.4441339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4441516Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4441894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4442085Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4442679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4442846Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4443233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4443429Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4443655Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4443993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4444231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4444472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4444878Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4445255Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4445362Z ok (6.437s) 2022-12-01T10:36:33.4445383Z 2022-12-01T10:36:33.4445649Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4445766Z Ran 1 test in 6.437s 2022-12-01T10:36:33.4445786Z 2022-12-01T10:36:33.4445879Z OK 2022-12-01T10:36:33.4445898Z 2022-12-01T10:36:33.4446027Z Generating XML reports... 2022-12-01T10:36:33.4446461Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103452.xml 2022-12-01T10:36:33.4446830Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4447005Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4447364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4447554Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4447574Z 2022-12-01T10:36:33.4447683Z Running tests... 2022-12-01T10:36:33.4447944Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4448340Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4448589Z test_gather_stress (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4448808Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12972 2022-12-01T10:36:33.4449025Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12973 2022-12-01T10:36:33.4449378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4449554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4449933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4450124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4450490Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4450667Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4451048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4451239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4451464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4451694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4451912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4452151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4452551Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4452950Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4453104Z ok (10.345s) 2022-12-01T10:36:33.4453126Z 2022-12-01T10:36:33.4453409Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4453526Z Ran 1 test in 10.345s 2022-12-01T10:36:33.4453546Z 2022-12-01T10:36:33.4453637Z OK 2022-12-01T10:36:33.4453657Z 2022-12-01T10:36:33.4453765Z Generating XML reports... 2022-12-01T10:36:33.4454201Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103501.xml 2022-12-01T10:36:33.4454568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4454745Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4455121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4455316Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4455339Z 2022-12-01T10:36:33.4455448Z Running tests... 2022-12-01T10:36:33.4455709Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4456002Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4456248Z test_reduce_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4456470Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13093 2022-12-01T10:36:33.4456688Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13094 2022-12-01T10:36:33.4457058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4457296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4457679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4457872Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4458241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4458396Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4458771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4458961Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4459190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4459436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4459663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4459905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4460307Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4460684Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4460789Z ok (6.545s) 2022-12-01T10:36:33.4460808Z 2022-12-01T10:36:33.4461071Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4461184Z Ran 1 test in 6.546s 2022-12-01T10:36:33.4461203Z 2022-12-01T10:36:33.4461296Z OK 2022-12-01T10:36:33.4461315Z 2022-12-01T10:36:33.4461439Z Generating XML reports... 2022-12-01T10:36:33.4461875Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103514.xml 2022-12-01T10:36:33.4462251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4462480Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4462855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4463046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4463066Z 2022-12-01T10:36:33.4463175Z Running tests... 2022-12-01T10:36:33.4463441Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4463753Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4464018Z test_reduce_scatter_base_basics (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4464241Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13210 2022-12-01T10:36:33.4464457Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13211 2022-12-01T10:36:33.4464811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4464988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4465364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4465552Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4465919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4466093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4466508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4466761Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4466995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4467222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4467447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4467684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4468091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4468483Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4468587Z ok (5.212s) 2022-12-01T10:36:33.4468607Z 2022-12-01T10:36:33.4468874Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4468988Z Ran 1 test in 5.212s 2022-12-01T10:36:33.4469008Z 2022-12-01T10:36:33.4469082Z OK 2022-12-01T10:36:33.4469119Z 2022-12-01T10:36:33.4469229Z Generating XML reports... 2022-12-01T10:36:33.4469664Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103522.xml 2022-12-01T10:36:33.4470035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4470210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4470584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4470774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4470793Z 2022-12-01T10:36:33.4470901Z Running tests... 2022-12-01T10:36:33.4471168Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4471462Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4471825Z test_reduce_scatter_base_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4472059Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13319 2022-12-01T10:36:33.4472399Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13320 2022-12-01T10:36:33.4472778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4472936Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4473313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4473502Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4473876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4474055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4474435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4474622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4474851Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4475079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4475301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4475542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4476008Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4476409Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4476514Z ok (6.447s) 2022-12-01T10:36:33.4476534Z 2022-12-01T10:36:33.4476798Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4476913Z Ran 1 test in 6.447s 2022-12-01T10:36:33.4476933Z 2022-12-01T10:36:33.4477025Z OK 2022-12-01T10:36:33.4477044Z 2022-12-01T10:36:33.4477149Z Generating XML reports... 2022-12-01T10:36:33.4477583Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103530.xml 2022-12-01T10:36:33.4477952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4478126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4478508Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4478699Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4478719Z 2022-12-01T10:36:33.4478830Z Running tests... 2022-12-01T10:36:33.4479090Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4479384Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4479636Z test_reduce_scatter_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4479854Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13436 2022-12-01T10:36:33.4480072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13437 2022-12-01T10:36:33.4480441Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4480620Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4481056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4481261Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4481632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4481787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4482162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4482349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4482781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4483032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4483261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4483504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4483913Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4484309Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4484395Z ok (6.438s) 2022-12-01T10:36:33.4484415Z 2022-12-01T10:36:33.4484682Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4484799Z Ran 1 test in 6.439s 2022-12-01T10:36:33.4484818Z 2022-12-01T10:36:33.4484909Z OK 2022-12-01T10:36:33.4484929Z 2022-12-01T10:36:33.4485053Z Generating XML reports... 2022-12-01T10:36:33.4485592Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103539.xml 2022-12-01T10:36:33.4485964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4486141Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4486498Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4486691Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4486711Z 2022-12-01T10:36:33.4486820Z Running tests... 2022-12-01T10:36:33.4487083Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4487392Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4487639Z test_scatter_checks (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4487861Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13553 2022-12-01T10:36:33.4488077Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13554 2022-12-01T10:36:33.4488432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4488610Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4488989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4489178Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4489545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4489719Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4490096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4490293Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4490594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4490840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4491065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4491306Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4491708Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4492104Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4492207Z ok (5.251s) 2022-12-01T10:36:33.4492231Z 2022-12-01T10:36:33.4492494Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4492606Z Ran 1 test in 5.251s 2022-12-01T10:36:33.4492626Z 2022-12-01T10:36:33.4492723Z OK 2022-12-01T10:36:33.4492742Z 2022-12-01T10:36:33.4492849Z Generating XML reports... 2022-12-01T10:36:33.4493284Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103547.xml 2022-12-01T10:36:33.4493654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4493831Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4494212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4494404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4494424Z 2022-12-01T10:36:33.4494593Z Running tests... 2022-12-01T10:36:33.4494860Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4495158Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4495398Z test_scatter_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4495616Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13662 2022-12-01T10:36:33.4495834Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13663 2022-12-01T10:36:33.4496204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4496378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4496755Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4496945Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4497316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4497477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4497854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4498043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4498272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4498517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4498742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4498981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4499384Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4499815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4499925Z ok (6.416s) 2022-12-01T10:36:33.4499945Z 2022-12-01T10:36:33.4500209Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4500324Z Ran 1 test in 6.416s 2022-12-01T10:36:33.4500344Z 2022-12-01T10:36:33.4500435Z OK 2022-12-01T10:36:33.4500454Z 2022-12-01T10:36:33.4500579Z Generating XML reports... 2022-12-01T10:36:33.4501013Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103555.xml 2022-12-01T10:36:33.4501383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4501558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4501922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4502116Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4502136Z 2022-12-01T10:36:33.4502246Z Running tests... 2022-12-01T10:36:33.4502509Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4502822Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4503069Z test_scatter_stress (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4503288Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13783 2022-12-01T10:36:33.4503505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13784 2022-12-01T10:36:33.4503855Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4504093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4504479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4504668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4505036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4505209Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4505586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4505776Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4506005Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4506240Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4506464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4506707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4507108Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4507503Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4507607Z ok (10.435s) 2022-12-01T10:36:33.4507627Z 2022-12-01T10:36:33.4507890Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4508008Z Ran 1 test in 10.435s 2022-12-01T10:36:33.4508027Z 2022-12-01T10:36:33.4508101Z OK 2022-12-01T10:36:33.4508138Z 2022-12-01T10:36:33.4508243Z Generating XML reports... 2022-12-01T10:36:33.4508681Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103604.xml 2022-12-01T10:36:33.4509107Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4509295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4509676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4509866Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4509886Z 2022-12-01T10:36:33.4509996Z Running tests... 2022-12-01T10:36:33.4510254Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4510549Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4510786Z test_send_recv (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4511010Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13904 2022-12-01T10:36:33.4511229Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13905 2022-12-01T10:36:33.4511601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4511777Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4512152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4512343Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4512690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4512866Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4513239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4513492Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4513723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:36:33.4513968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4514192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:36:33.4514431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:36:33.4514838Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4515216Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:36:33.4515324Z ok (5.322s) 2022-12-01T10:36:33.4515344Z 2022-12-01T10:36:33.4515604Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4515718Z Ran 1 test in 5.323s 2022-12-01T10:36:33.4515741Z 2022-12-01T10:36:33.4515834Z OK 2022-12-01T10:36:33.4515853Z 2022-12-01T10:36:33.4515978Z Generating XML reports... 2022-12-01T10:36:33.4516415Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103616.xml 2022-12-01T10:36:33.4516785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4516959Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4517319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4517509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4517533Z 2022-12-01T10:36:33.4517643Z Running tests... 2022-12-01T10:36:33.4517905Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4518267Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4518508Z test_common_errors (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4518753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4519159Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4519384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4519779Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4520020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4520418Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4520663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4521058Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4521163Z ok (1.544s) 2022-12-01T10:36:33.4521182Z 2022-12-01T10:36:33.4521443Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4521555Z Ran 1 test in 1.544s 2022-12-01T10:36:33.4521575Z 2022-12-01T10:36:33.4521650Z OK 2022-12-01T10:36:33.4521669Z 2022-12-01T10:36:33.4521793Z Generating XML reports... 2022-12-01T10:36:33.4522209Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-RendezvousEnvTest-20221201103624.xml 2022-12-01T10:36:33.4522898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:36:33.4523075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:36:33.4523459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:36:33.4523651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:36:33.4523671Z 2022-12-01T10:36:33.4523782Z Running tests... 2022-12-01T10:36:33.4524045Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4524338Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-12-01T10:36:33.4524575Z test_default_store_timeout_nccl (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:36:33.4524818Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4525224Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4525468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:36:33.4525864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-12-01T10:36:33.4525968Z ok (4.588s) 2022-12-01T10:36:33.4525986Z 2022-12-01T10:36:33.4526248Z ---------------------------------------------------------------------- 2022-12-01T10:36:33.4526343Z Ran 1 test in 4.588s 2022-12-01T10:36:33.4526380Z 2022-12-01T10:36:33.4526456Z OK 2022-12-01T10:36:33.4526475Z 2022-12-01T10:36:33.4526600Z Generating XML reports... 2022-12-01T10:36:33.4526996Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-TimeoutTest-20221201103628.xml 2022-12-01T10:36:33.4527016Z 2022-12-01T10:36:33.4527439Z ##[endgroup] 2022-12-01T10:36:33.4527880Z FINISHED PRINTING LOG FILE of distributed/test_c10d_nccl (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_nccl_q8lksq9h) 2022-12-01T10:36:33.4527901Z 2022-12-01T10:36:33.4528251Z Running distributed/fsdp/test_fsdp_core ... [2022-12-01 10:36:33.319466] 2022-12-01T10:36:33.4528751Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:36:33.319840] 2022-12-01T10:42:33.6449190Z 2022-12-01T10:42:33.6449665Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_core 2022-12-01T10:42:33.6454171Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_core (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_core_nmavg695) 2022-12-01T10:42:33.6461399Z 2022-12-01T10:42:33.6461735Z Running tests... 2022-12-01T10:42:33.6462273Z ---------------------------------------------------------------------- 2022-12-01T10:42:33.6462844Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_core 2022-12-01T10:42:33.6465121Z test_pre_backward_hook_registration_after_state_dict (__main__.TestHooks) 2022-12-01T10:42:33.6465810Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:42:33.6466325Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14114 2022-12-01T10:42:33.6468250Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14115 2022-12-01T10:42:33.6469135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6469747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6470462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6471086Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6472255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6472728Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6473576Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6474045Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6474613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6475277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6476178Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6476930Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6477723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6478213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6479754Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6480879Z warnings.warn( 2022-12-01T10:42:33.6482290Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6483828Z warnings.warn( 2022-12-01T10:42:33.6484933Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6485589Z warnings.warn( 2022-12-01T10:42:33.6486559Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6487346Z warnings.warn( 2022-12-01T10:42:33.6488131Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6488992Z warnings.warn( 2022-12-01T10:42:33.6489937Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6490580Z warnings.warn( 2022-12-01T10:42:33.6490803Z dist init r=1, world=2 2022-12-01T10:42:33.6491149Z dist init r=0, world=2 2022-12-01T10:42:33.6491559Z ok (5.876s) 2022-12-01T10:42:33.6491870Z test_pre_backward_hook_registration_cuda_first_False (__main__.TestHooks) 2022-12-01T10:42:33.6492771Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14197 2022-12-01T10:42:33.6493382Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14198 2022-12-01T10:42:33.6494397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6494849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6495716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6496202Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6497041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6497503Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6498311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6498817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6499872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6500390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6501329Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6502105Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6502830Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6503321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6504866Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6505979Z warnings.warn( 2022-12-01T10:42:33.6507424Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6508478Z warnings.warn( 2022-12-01T10:42:33.6509498Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6510080Z warnings.warn( 2022-12-01T10:42:33.6511127Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6511670Z warnings.warn( 2022-12-01T10:42:33.6512731Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6513426Z warnings.warn( 2022-12-01T10:42:33.6514319Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6515133Z warnings.warn( 2022-12-01T10:42:33.6515450Z dist init r=1, world=2 2022-12-01T10:42:33.6515706Z dist init r=0, world=2 2022-12-01T10:42:33.6515950Z ok (4.311s) 2022-12-01T10:42:33.6516323Z test_pre_backward_hook_registration_cuda_first_True (__main__.TestHooks) 2022-12-01T10:42:33.6517315Z Tests that FSDP pre-backward hooks are registered on forward pass ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14280 2022-12-01T10:42:33.6517877Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14281 2022-12-01T10:42:33.6518466Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6519019Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6519782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6520270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6520827Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6521286Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6521861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6522321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6523315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6523901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6524566Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6525271Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6525786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6526366Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6527263Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6527819Z warnings.warn( 2022-12-01T10:42:33.6528562Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6529096Z warnings.warn( 2022-12-01T10:42:33.6529866Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6530595Z warnings.warn( 2022-12-01T10:42:33.6531467Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6532031Z warnings.warn( 2022-12-01T10:42:33.6532278Z dist init r=1, world=2 2022-12-01T10:42:33.6532538Z dist init r=0, world=2 2022-12-01T10:42:33.6532760Z ok (4.311s) 2022-12-01T10:42:33.6533107Z test_register_functions_called_cuda_first_False_mixed_precision_False (__main__.TestHooks) 2022-12-01T10:42:33.6533657Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14363 2022-12-01T10:42:33.6534173Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14364 2022-12-01T10:42:33.6534905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6535368Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6535947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6536396Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6536988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6537427Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6537980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6538464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6538921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6539423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6540104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6540874Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6541401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6541879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6543080Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6543902Z warnings.warn( 2022-12-01T10:42:33.6545358Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6546122Z warnings.warn( 2022-12-01T10:42:33.6546892Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6547455Z warnings.warn( 2022-12-01T10:42:33.6548188Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6548737Z warnings.warn( 2022-12-01T10:42:33.6548990Z dist init r=1, world=2 2022-12-01T10:42:33.6549225Z dist init r=0, world=2 2022-12-01T10:42:33.6549462Z ok (4.312s) 2022-12-01T10:42:33.6549810Z test_register_functions_called_cuda_first_False_mixed_precision_True (__main__.TestHooks) 2022-12-01T10:42:33.6550344Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14442 2022-12-01T10:42:33.6550895Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14443 2022-12-01T10:42:33.6551500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6552048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6552604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6553089Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6553672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6554116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6554675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6555155Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6555614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6556125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6556763Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6557469Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6557989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6558469Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6559628Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6560356Z warnings.warn( 2022-12-01T10:42:33.6561464Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6562213Z warnings.warn( 2022-12-01T10:42:33.6563706Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6564466Z warnings.warn( 2022-12-01T10:42:33.6565584Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6566340Z warnings.warn( 2022-12-01T10:42:33.6567103Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6567640Z warnings.warn( 2022-12-01T10:42:33.6568376Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6569057Z warnings.warn( 2022-12-01T10:42:33.6569316Z dist init r=1, world=2 2022-12-01T10:42:33.6569572Z dist init r=0, world=2 2022-12-01T10:42:33.6569789Z ok (4.312s) 2022-12-01T10:42:33.6570137Z test_register_functions_called_cuda_first_True_mixed_precision_False (__main__.TestHooks) 2022-12-01T10:42:33.6570914Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14521 2022-12-01T10:42:33.6571533Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14522 2022-12-01T10:42:33.6572179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6572650Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6573230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6573694Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6574285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6574737Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6575319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6575769Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6576228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6576735Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6577360Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6578059Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6578680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6579183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6580035Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6580588Z warnings.warn( 2022-12-01T10:42:33.6581343Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6581891Z warnings.warn( 2022-12-01T10:42:33.6582133Z dist init r=1, world=2 2022-12-01T10:42:33.6582388Z dist init r=0, world=2 2022-12-01T10:42:33.6582621Z ok (4.312s) 2022-12-01T10:42:33.6582952Z test_register_functions_called_cuda_first_True_mixed_precision_True (__main__.TestHooks) 2022-12-01T10:42:33.6583514Z Tests that ``_register_{pre|post}_backward_hooks()`` are called ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14600 2022-12-01T10:42:33.6584051Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14601 2022-12-01T10:42:33.6584664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6585105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6585685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6586225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6586821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6587246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6587819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6588295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6588749Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6589225Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6589886Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6590586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6591121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6591576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6592751Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6593483Z warnings.warn( 2022-12-01T10:42:33.6594542Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6595308Z warnings.warn( 2022-12-01T10:42:33.6596075Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6596634Z warnings.warn( 2022-12-01T10:42:33.6597391Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6597926Z warnings.warn( 2022-12-01T10:42:33.6598160Z dist init r=1, world=2 2022-12-01T10:42:33.6598408Z dist init r=0, world=2 2022-12-01T10:42:33.6598646Z ok (4.412s) 2022-12-01T10:42:33.6598941Z test_transformer_no_grad_mixed_precision_False (__main__.TestNoGrad) 2022-12-01T10:42:33.6599600Z Tests that for an FSDP-wrapped transformer model with shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14679 2022-12-01T10:42:33.6600141Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14680 2022-12-01T10:42:33.6600728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6601177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6601748Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6602217Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6603097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6603644Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6604224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6604693Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6605127Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6605620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6606273Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6606945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6607473Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6607948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6609173Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6609936Z warnings.warn( 2022-12-01T10:42:33.6611029Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6611785Z warnings.warn( 2022-12-01T10:42:33.6612615Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6613177Z warnings.warn( 2022-12-01T10:42:33.6613911Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6614456Z warnings.warn( 2022-12-01T10:42:33.6615219Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6615774Z warnings.warn( 2022-12-01T10:42:33.6616512Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6617051Z warnings.warn( 2022-12-01T10:42:33.6617300Z dist init r=0, world=2 2022-12-01T10:42:33.6617552Z dist init r=1, world=2 2022-12-01T10:42:33.6617771Z ok (4.412s) 2022-12-01T10:42:33.6618090Z test_transformer_no_grad_mixed_precision_True (__main__.TestNoGrad) 2022-12-01T10:42:33.6618738Z Tests that for an FSDP-wrapped transformer model with shared ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14762 2022-12-01T10:42:33.6619262Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14763 2022-12-01T10:42:33.6619869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6620388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6620969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6621424Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6622008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6622451Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6623314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6623791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6624245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6624747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6625394Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6626087Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6626606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6627077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6628220Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6628951Z warnings.warn( 2022-12-01T10:42:33.6630064Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6630788Z warnings.warn( 2022-12-01T10:42:33.6631903Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6632644Z warnings.warn( 2022-12-01T10:42:33.6633754Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6634497Z warnings.warn( 2022-12-01T10:42:33.6635257Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6635796Z warnings.warn( 2022-12-01T10:42:33.6636526Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6637134Z warnings.warn( 2022-12-01T10:42:33.6637903Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6638457Z warnings.warn( 2022-12-01T10:42:33.6639190Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6639733Z warnings.warn( 2022-12-01T10:42:33.6639981Z dist init r=0, world=2 2022-12-01T10:42:33.6640216Z dist init r=1, world=2 2022-12-01T10:42:33.6640499Z ok (4.412s) 2022-12-01T10:42:33.6640837Z test_param_change_after_init_mixed_precision_False (__main__.TestParamInit) 2022-12-01T10:42:33.6641508Z Tests that changing FSDP model parameter values in-place after FSDP ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14845 2022-12-01T10:42:33.6642067Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14846 2022-12-01T10:42:33.6643176Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6643631Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6644193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6644663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6645241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6645684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6646244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6646791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6647262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6647741Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6648411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6649102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6649625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6650078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6651303Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6652070Z warnings.warn( 2022-12-01T10:42:33.6653181Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6654017Z warnings.warn( 2022-12-01T10:42:33.6654769Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6655317Z warnings.warn( 2022-12-01T10:42:33.6656070Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6656607Z warnings.warn( 2022-12-01T10:42:33.6656834Z dist init r=1, world=2 2022-12-01T10:42:33.6657084Z dist init r=0, world=2 2022-12-01T10:42:33.6657324Z ok (4.211s) 2022-12-01T10:42:33.6657636Z test_param_change_after_init_mixed_precision_True (__main__.TestParamInit) 2022-12-01T10:42:33.6658307Z Tests that changing FSDP model parameter values in-place after FSDP ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14924 2022-12-01T10:42:33.6658862Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14925 2022-12-01T10:42:33.6659477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6659911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6660487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6660954Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6661512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6661957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6662524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6662992Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6663481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6663995Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6664660Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6665352Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6665854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6666324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6667487Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6668211Z warnings.warn( 2022-12-01T10:42:33.6669246Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1283: UserWarning: Both mixed precision and an `auto_wrap_policy` were specified for FSDP, where the wrapped module has batch norm submodules. The batch norm submodules will be wrapped as separate FSDP instances with mixed precision disabled since some batch norm kernels do not support low precision. 2022-12-01T10:42:33.6669965Z warnings.warn( 2022-12-01T10:42:33.6671079Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6671897Z warnings.warn( 2022-12-01T10:42:33.6673011Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6673758Z warnings.warn( 2022-12-01T10:42:33.6674500Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6675047Z warnings.warn( 2022-12-01T10:42:33.6675799Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6676337Z warnings.warn( 2022-12-01T10:42:33.6676567Z dist init r=1, world=2 2022-12-01T10:42:33.6676818Z dist init r=0, world=2 2022-12-01T10:42:33.6677056Z ok (4.211s) 2022-12-01T10:42:33.6677373Z test_delayed_optim_step_offload_false_no_shard (__main__.TestParityWithDDP) 2022-12-01T10:42:33.6677913Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15003 2022-12-01T10:42:33.6678442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15004 2022-12-01T10:42:33.6679039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6679491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6680119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6680607Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6681171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6681621Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6682197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6682935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6683369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6683874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6684536Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6685209Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6685735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6686205Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6686681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6687147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6688376Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6689251Z warnings.warn( 2022-12-01T10:42:33.6690362Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6691112Z warnings.warn( 2022-12-01T10:42:33.6691858Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6692403Z warnings.warn( 2022-12-01T10:42:33.6693149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6693688Z warnings.warn( 2022-12-01T10:42:33.6694584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6695248Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6696242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6696917Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6697308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6697800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6698280Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6698759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6699215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6699689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6700164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6700621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6701882Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.6702784Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.6704022Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.6704972Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.6705447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6705913Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6706399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6706879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6707331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6707813Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6708284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6708760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6709218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6709690Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6710161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6710630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6711085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6711553Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6711915Z dist init r=0, world=2 2022-12-01T10:42:33.6712147Z dist init r=1, world=2 2022-12-01T10:42:33.6712383Z ok (11.422s) 2022-12-01T10:42:33.6712763Z test_delayed_optim_step_offload_false_none (__main__.TestParityWithDDP) 2022-12-01T10:42:33.6713298Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15086 2022-12-01T10:42:33.6713832Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15087 2022-12-01T10:42:33.6714447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6714901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6715459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6715924Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6716504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6716929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6717505Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6717968Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6718421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6718901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6719558Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6720252Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6720846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6721303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6721776Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6722265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6723937Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6724785Z warnings.warn( 2022-12-01T10:42:33.6726223Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6726985Z warnings.warn( 2022-12-01T10:42:33.6727746Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6728294Z warnings.warn( 2022-12-01T10:42:33.6729023Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6729560Z warnings.warn( 2022-12-01T10:42:33.6730418Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6730987Z warnings.warn( 2022-12-01T10:42:33.6731740Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6732293Z warnings.warn( 2022-12-01T10:42:33.6733198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6733862Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6734780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6735442Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6735851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6736348Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6736814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6737289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6737761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6738307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6739596Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.6740529Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.6741778Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.6742648Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.6743107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6743572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6744048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6744525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6744976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6745453Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6745927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6746395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6746897Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6747385Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6747853Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6748307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6748783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6749253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6749721Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6750175Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6750529Z dist init r=1, world=2 2022-12-01T10:42:33.6750785Z dist init r=0, world=2 2022-12-01T10:42:33.6751009Z ok (16.430s) 2022-12-01T10:42:33.6751357Z test_delayed_optim_step_offload_false_shard_grad_op (__main__.TestParityWithDDP) 2022-12-01T10:42:33.6751905Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15169 2022-12-01T10:42:33.6752438Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15170 2022-12-01T10:42:33.6753040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6753498Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6754074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6754588Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6755174Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6755621Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6756193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6756642Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6757093Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6757587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6758227Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6758925Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6759455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6759929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6760388Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6760869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6762098Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6763136Z warnings.warn( 2022-12-01T10:42:33.6764345Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6765106Z warnings.warn( 2022-12-01T10:42:33.6765880Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6766425Z warnings.warn( 2022-12-01T10:42:33.6767219Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6767755Z warnings.warn( 2022-12-01T10:42:33.6768527Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6769076Z warnings.warn( 2022-12-01T10:42:33.6769832Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6770352Z warnings.warn( 2022-12-01T10:42:33.6771257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6772016Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6772947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6773604Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6773989Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6774475Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6774958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6775431Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6775889Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6776361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6777636Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.6778515Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.6779776Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.6780650Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.6781099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6781583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6782061Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6782515Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6782984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6783463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6783920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6784389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6784857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6785323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6785777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6786246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6786711Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6787162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6787692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6788165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6788522Z dist init r=0, world=2 2022-12-01T10:42:33.6788752Z dist init r=1, world=2 2022-12-01T10:42:33.6788987Z ok (16.330s) 2022-12-01T10:42:33.6789320Z test_delayed_optim_step_offload_true_no_shard (__main__.TestParityWithDDP) 2022-12-01T10:42:33.6790430Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82490 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-12-01T10:42:33.6791206Z test_delayed_optim_step_offload_true_none (__main__.TestParityWithDDP) 2022-12-01T10:42:33.6791738Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15252 2022-12-01T10:42:33.6792273Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15253 2022-12-01T10:42:33.6792867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6793317Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6793889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6794349Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6794904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6795348Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6795919Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6796428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6796874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6797369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6798027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6798701Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6799218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6799683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6800164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6800634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6801860Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6802869Z warnings.warn( 2022-12-01T10:42:33.6803996Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6804844Z warnings.warn( 2022-12-01T10:42:33.6805199Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6805681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6806691Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6807936Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6808673Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6809139Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6809615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6810091Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6811062Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6812349Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6813612Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6814841Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6816058Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6817281Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6818499Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6819724Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6820494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6820979Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6821967Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6823178Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6823911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6824379Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6824852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6825327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6826315Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6827593Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6828840Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6830066Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6831295Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6832506Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6833723Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6834942Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6836227Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6837440Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6838645Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6839865Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6841122Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6842600Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6843866Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6845085Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6846307Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6847529Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6848740Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6849941Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6851252Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6852464Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6853673Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6854881Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6856094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6857362Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6858607Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6859801Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6861020Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6862235Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6863439Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6864651Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6865936Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6867151Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6868372Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6869596Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6870692Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6871234Z warnings.warn( 2022-12-01T10:42:33.6871982Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.6872527Z warnings.warn( 2022-12-01T10:42:33.6873329Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6873901Z warnings.warn( 2022-12-01T10:42:33.6874660Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.6875199Z warnings.warn( 2022-12-01T10:42:33.6876086Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6876751Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6877686Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.6878339Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.6878727Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6879218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6879698Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6880174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6880701Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6881180Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6881647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6882098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6882843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6883315Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6883782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6884234Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6884706Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6885172Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6885644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6886096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6886559Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6887023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6887473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6887936Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6888397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6888865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6889941Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6891210Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6892432Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6893668Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6894883Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6896114Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6897431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6898648Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6899863Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6901086Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6902295Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6903503Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6904760Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6905995Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6907196Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6908420Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6909004Z dist init r=1, world=2 2022-12-01T10:42:33.6909253Z dist init r=0, world=2 2022-12-01T10:42:33.6909492Z ok (19.634s) 2022-12-01T10:42:33.6909820Z test_delayed_optim_step_offload_true_shard_grad_op (__main__.TestParityWithDDP) 2022-12-01T10:42:33.6910365Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15335 2022-12-01T10:42:33.6910892Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15336 2022-12-01T10:42:33.6911500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6912011Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6912594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6913062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6913636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.6914058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.6914626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.6915088Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.6915523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.6916021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.6916679Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6917366Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.6917867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.6918339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.6918812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6919295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6920560Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6921338Z warnings.warn( 2022-12-01T10:42:33.6922673Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.6923436Z warnings.warn( 2022-12-01T10:42:33.6923805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6924275Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6925281Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6926522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6927254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6927719Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6928349Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6928823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6929819Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6931059Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6932267Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6933497Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6934713Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6935995Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6937238Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6938454Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6939179Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6939651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6940683Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6941913Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6942632Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6943111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6943632Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6944106Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.6945094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6946323Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6947548Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6977797Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6979184Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6980560Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6981842Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6983066Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6984303Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6985532Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6986747Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6987975Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6989313Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6990538Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6991759Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6992982Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6994202Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.6998946Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7000223Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7001425Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7002975Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7004231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7005450Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7006665Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7008013Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7009233Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7010460Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7011681Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7012891Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7014176Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7015422Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7016634Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7017857Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7019076Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7020296Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7021510Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7022697Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7023253Z warnings.warn( 2022-12-01T10:42:33.7024013Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7024555Z warnings.warn( 2022-12-01T10:42:33.7025309Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7025860Z warnings.warn( 2022-12-01T10:42:33.7026635Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7027186Z warnings.warn( 2022-12-01T10:42:33.7028081Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7028744Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7029674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7030386Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7030811Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7031279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7031761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7032242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7032699Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7033176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7033650Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7034133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7034604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7035078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7035551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7036007Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7036481Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7036950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7037421Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7037875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7038408Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7038886Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7039342Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7039816Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7040287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7040798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7041787Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7043302Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7044540Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7045764Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7047058Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7048315Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7049535Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7050775Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7051978Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7053208Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7054519Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7055742Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7056963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7058189Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7059413Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7060624Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7061214Z dist init r=0, world=2 2022-12-01T10:42:33.7061466Z dist init r=1, world=2 2022-12-01T10:42:33.7061704Z ok (19.642s) 2022-12-01T10:42:33.7062089Z test_delayed_reduce_scatter_offload_false_no_shard (__main__.TestParityWithDDP) 2022-12-01T10:42:33.7062663Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15418 2022-12-01T10:42:33.7063195Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15419 2022-12-01T10:42:33.7063810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7064247Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7064824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7065297Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7065871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7066299Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7066871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7067333Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7067770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7068274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7068930Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7069620Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7070197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7070671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7071151Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7071635Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7072854Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7073614Z warnings.warn( 2022-12-01T10:42:33.7074719Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7075472Z warnings.warn( 2022-12-01T10:42:33.7076230Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7076758Z warnings.warn( 2022-12-01T10:42:33.7077511Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7078050Z warnings.warn( 2022-12-01T10:42:33.7079013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7079681Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7080618Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7081270Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7081672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7082148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7082869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7083354Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7083830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7084287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7084760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7085233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7086496Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.7087475Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.7088725Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.7089599Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.7090077Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7090566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7091031Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7091516Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7091991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7092447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7092923Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7093395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7093864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7094326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7094869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7095356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7095813Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7096287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7096641Z dist init r=1, world=2 2022-12-01T10:42:33.7096893Z dist init r=0, world=2 2022-12-01T10:42:33.7097113Z ok (4.411s) 2022-12-01T10:42:33.7097449Z test_delayed_reduce_scatter_offload_false_none (__main__.TestParityWithDDP) 2022-12-01T10:42:33.7098585Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82704 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-12-01T10:42:33.7099395Z test_delayed_reduce_scatter_offload_false_shard_grad_op (__main__.TestParityWithDDP) 2022-12-01T10:42:33.7100503Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82398 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-12-01T10:42:33.7101291Z test_delayed_reduce_scatter_offload_true_no_shard (__main__.TestParityWithDDP) 2022-12-01T10:42:33.7101829Z Tests the FSDP forward, backward, and optimizer step runtime by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15501 2022-12-01T10:42:33.7102428Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15502 2022-12-01T10:42:33.7103025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7103477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7104051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7104519Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7105082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7105523Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7106093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7106547Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7107007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7107512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7108173Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7108837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7109359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7109836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7110317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7110794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7112087Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7112866Z warnings.warn( 2022-12-01T10:42:33.7113981Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7114733Z warnings.warn( 2022-12-01T10:42:33.7115089Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7115576Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7116581Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7117822Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7118610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7119094Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7119569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7120046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7121020Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7122241Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7123726Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7124963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7126192Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7127492Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7128746Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7129962Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7130684Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7131174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7132168Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7133395Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7134216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7134689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7135170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7135650Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7136643Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7137855Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7139095Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7140327Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7141583Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7142848Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7144091Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7145314Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7146547Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7147767Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7148962Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7150234Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7151456Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7152664Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7153889Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7155107Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7156326Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7157584Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7158803Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7160022Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7161248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7162723Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7163963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7165276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7166496Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7167708Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7168940Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7170138Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7171362Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7172640Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7173879Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7175094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7176319Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7177535Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7178653Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7179253Z warnings.warn( 2022-12-01T10:42:33.7180023Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7180568Z warnings.warn( 2022-12-01T10:42:33.7181477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7182127Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7183056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7183718Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7184126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7184596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7185085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7185566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7186044Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7186503Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7186975Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7187454Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7187910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7188434Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7188926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7189395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7189852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7190324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7190792Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7191249Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7191724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7192189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7192666Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7193120Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7193587Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7194057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.7195061Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7196370Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7197595Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7198818Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7200040Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7201263Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7202698Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7204018Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7205258Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7206466Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7207696Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7208910Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7210121Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7211441Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7212658Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7213881Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7215100Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7216322Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7216910Z dist init r=1, world=2 2022-12-01T10:42:33.7217160Z dist init r=0, world=2 2022-12-01T10:42:33.7217400Z ok (4.712s) 2022-12-01T10:42:33.7217723Z test_delayed_reduce_scatter_offload_true_none (__main__.TestParityWithDDP) 2022-12-01T10:42:33.7218901Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82399 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-12-01T10:42:33.7219716Z test_delayed_reduce_scatter_offload_true_shard_grad_op (__main__.TestParityWithDDP) 2022-12-01T10:42:33.7220840Z Tests the FSDP forward, backward, and optimizer step runtime by ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/82403 for platform(s) linux, rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-12-01T10:42:33.7221768Z test_mixture_of_experts_offload_false_no_shard_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15584 2022-12-01T10:42:33.7222317Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15585 2022-12-01T10:42:33.7222925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7223375Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7223948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7224400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7224976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7225416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7225986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7226503Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7226957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7227461Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7228108Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7228797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7229316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7229788Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7230995Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7231764Z warnings.warn( 2022-12-01T10:42:33.7232873Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7233624Z warnings.warn( 2022-12-01T10:42:33.7234004Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7234491Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7235198Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7236136Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7236682Z warnings.warn( 2022-12-01T10:42:33.7237084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7237683Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7237796Z warnings.warn( 2022-12-01T10:42:33.7238563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7238707Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7239473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7239614Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7239860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7240103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7240623Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7241023Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7241776Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7242715Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7243488Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7244222Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7244964Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7245775Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7246039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7246282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7246688Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7247078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7247818Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7248552Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7249288Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7250010Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7250831Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7251563Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7251806Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7252049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7252448Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7252841Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7253573Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7254299Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7255093Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7255835Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7256552Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7257281Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7258026Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7258759Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7259555Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7260279Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7261006Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7261736Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7262455Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7263174Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7263958Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7264695Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7265427Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7266157Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7266890Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7267609Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7268399Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7269121Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7269367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7269609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7270010Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7270408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7271144Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7271865Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7272638Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7273381Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7274111Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7274834Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7275079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7275316Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7275712Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7276084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7276820Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7277624Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7278355Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7279073Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7279814Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7280546Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7280789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7281033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7281434Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7281876Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7282807Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7283544Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7284282Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7285001Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7285730Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7286546Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7286787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7287031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7287427Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7287802Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7288543Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7289277Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7290012Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7290796Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7291550Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7292267Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7292506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.7292745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.7293149Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7293546Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7294278Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7295000Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7295797Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7296521Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7297253Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7298037Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7298282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.7298683Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7298926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.7299322Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7300094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7300837Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7301568Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7302294Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7303028Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7303754Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7304066Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.7304474Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7304715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.7305112Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7305843Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7306568Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7307307Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7308031Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7308762Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7309533Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7309786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.7310022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.7310427Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7311161Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7311895Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7312617Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7313014Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7313783Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7314502Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7315240Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7315359Z dist init r=1, world=2 2022-12-01T10:42:33.7315466Z dist init r=0, world=2 2022-12-01T10:42:33.7315570Z ok (5.012s) 2022-12-01T10:42:33.7315919Z test_mixture_of_experts_offload_false_none_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15787 2022-12-01T10:42:33.7316136Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15788 2022-12-01T10:42:33.7316512Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7316687Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7317049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7317242Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7317613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7317834Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7318225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7318413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7318661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7318904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7319303Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7319678Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7319911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7320141Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7321123Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7321239Z warnings.warn( 2022-12-01T10:42:33.7322212Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7322539Z warnings.warn( 2022-12-01T10:42:33.7322795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7323043Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7323451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7323841Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7324467Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7324566Z warnings.warn( 2022-12-01T10:42:33.7325190Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7325299Z warnings.warn( 2022-12-01T10:42:33.7325928Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7326038Z warnings.warn( 2022-12-01T10:42:33.7326664Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7326775Z warnings.warn( 2022-12-01T10:42:33.7327619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7327774Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7328549Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7328688Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7328915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7329165Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7329567Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7330314Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7331041Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7331785Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7332268Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7333006Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7333730Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7334454Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7334699Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7334940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7335331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7335722Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7336515Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7337266Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7337992Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7338735Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7339470Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7340191Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7340474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7340942Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7341185Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7341577Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7342317Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7343038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7343779Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7344504Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7345234Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7346002Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7346754Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7347479Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7348214Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7348932Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7349669Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7350456Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7351188Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7351908Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7352647Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7353371Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7354097Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7354866Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7355617Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7356341Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7357077Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7357801Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7358048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7358292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7358758Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7359138Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7359870Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7360594Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7361331Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7362053Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7363021Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7363844Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7364100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7364343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7364742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7365133Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7365870Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7366599Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7367328Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7368050Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7368890Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7369612Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7369854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7370098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7370497Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7370873Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7371605Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7372325Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7373113Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7373855Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7374586Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7375315Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7375559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7375802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7376196Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7376586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7377321Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7378108Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7378839Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7379570Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7380305Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7381026Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7381274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.7381515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.7381961Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7382371Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7383086Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7383809Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7384531Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7385247Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7385981Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7386771Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7387018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.7387255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.7387654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7388050Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7388796Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7389524Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7390254Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7391027Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7391761Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7392478Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7392727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.7392969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.7393368Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7393765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7394483Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7395212Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7396015Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7396741Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7397473Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7398199Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7398446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.7398680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.7399077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7399478Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7400263Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7401002Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7401739Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7402681Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7403431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7404153Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7404356Z dist init r=1, world=2 2022-12-01T10:42:33.7404469Z dist init r=0, world=2 2022-12-01T10:42:33.7404571Z ok (5.113s) 2022-12-01T10:42:33.7404911Z test_mixture_of_experts_offload_false_shard_grad_op_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15990 2022-12-01T10:42:33.7405132Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15991 2022-12-01T10:42:33.7405516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7405693Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7406072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7406267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7406632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7406812Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7407189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7407360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7407607Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7407853Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7408250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7408642Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7408876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7409167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7410165Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7410282Z warnings.warn( 2022-12-01T10:42:33.7411265Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7411381Z warnings.warn( 2022-12-01T10:42:33.7411609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7411851Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7412248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7412869Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7413041Z warnings.warn( 2022-12-01T10:42:33.7413443Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7414066Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7414176Z warnings.warn( 2022-12-01T10:42:33.7414807Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7414917Z warnings.warn( 2022-12-01T10:42:33.7415525Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7415639Z warnings.warn( 2022-12-01T10:42:33.7416413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7416556Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7417319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7417461Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7417707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7417953Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7418404Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7418817Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7419566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7420278Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7421029Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7421765Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7422505Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7423302Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7423548Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7423790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7424187Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7424577Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7425318Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7426045Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7426784Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7427563Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7428313Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7429031Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7429277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7429517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7429913Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7430300Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7431028Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7431749Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7432591Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7433326Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7434044Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7434778Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7435519Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7436246Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7437026Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7437764Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7438495Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7439225Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7439957Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7440714Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7441515Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7442236Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7443140Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7443866Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7444595Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7445313Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7446121Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7446865Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7447110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7447350Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7447753Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7448143Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7448875Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7449593Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7450409Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7451132Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7451861Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7452590Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7452834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7453074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7453467Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7453840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7454621Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7455364Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7456095Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7456821Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7457555Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7458277Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7458518Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7458814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7459214Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7459608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7460335Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7461056Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7461796Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7462521Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7463238Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7464011Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7464264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7464504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7464900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7465275Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7466015Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7466744Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7467476Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7468262Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7468998Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7469719Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7469970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.7470209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.7470604Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7471001Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7471731Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7472448Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7473234Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7473976Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7474704Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7475429Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7475668Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.7475905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.7476302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7476700Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7477492Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7478229Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7478959Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7479687Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7480421Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7481146Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7481395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.7481682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.7482096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7482705Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7483451Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7484184Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7484912Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7485630Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7486456Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7487183Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7487422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.7487657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.7488053Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7488447Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7489171Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7489899Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7490632Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7491422Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7492158Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7492875Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7492987Z dist init r=0, world=2 2022-12-01T10:42:33.7493096Z dist init r=1, world=2 2022-12-01T10:42:33.7493193Z ok (5.113s) 2022-12-01T10:42:33.7493534Z test_mixture_of_experts_offload_true_no_shard_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16193 2022-12-01T10:42:33.7493755Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16194 2022-12-01T10:42:33.7494130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7494306Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7494668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7494925Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7495297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7495471Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7495835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7496021Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7496269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7496507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7496902Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7497286Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7497513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7497742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7498708Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7498823Z warnings.warn( 2022-12-01T10:42:33.7499837Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7499963Z warnings.warn( 2022-12-01T10:42:33.7500204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7500605Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7500840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7501232Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7501973Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7502709Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7502935Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7503173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7503558Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7504018Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7504260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7504491Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7504879Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7505623Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7506006Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7506750Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7506989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7507211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7507595Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7508325Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7508763Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7509509Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7509742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7509978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7510364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7511094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7511816Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7512557Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7513345Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7514073Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7514795Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7515522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7516248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7516977Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7517745Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7518487Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7519208Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7519940Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7520658Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7521381Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7522103Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7523127Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7523851Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7524581Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7525310Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7526034Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7526433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7527233Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7527980Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7528693Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7529423Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7530145Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7530864Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7531661Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7532368Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7533078Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7533813Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7534532Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7535259Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7536036Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7536774Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7537495Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7538219Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7538938Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7539664Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7540481Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7541219Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7541937Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7542181Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7542424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7542812Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7543532Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7543918Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7544700Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7544955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7545194Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7545587Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7546316Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7546943Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7547042Z warnings.warn( 2022-12-01T10:42:33.7547434Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7548159Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7548774Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7548940Z warnings.warn( 2022-12-01T10:42:33.7549711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7549847Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7550609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7550743Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7550982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7551223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7551609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7552003Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7552745Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7553465Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7554250Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7554985Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7555716Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7556444Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7556683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.7557078Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7557321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.7557713Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7558522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7559251Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7559979Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7560700Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7561426Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7562150Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7562655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.7562899Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.7563381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7563780Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7564514Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7565231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7565969Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7566687Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7567399Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7568198Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7568438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.7568669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.7569066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7569451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7570197Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7570920Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7571648Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7572419Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7573161Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7573884Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7574123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.7574515Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7574750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.7575128Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7575862Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7576590Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7577386Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7578104Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7578838Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7579564Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7579801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2022-12-01T10:42:33.7580199Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.7580440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2022-12-01T10:42:33.7580836Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.7581620Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7582352Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7583074Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7583798Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7584522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7585238Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7585537Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2022-12-01T10:42:33.7585767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2022-12-01T10:42:33.7586169Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.7586540Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.7587276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7588013Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7588737Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7589457Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7590273Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7591003Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7591248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2022-12-01T10:42:33.7591481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2022-12-01T10:42:33.7591874Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.7592274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.7593009Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7593740Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7594477Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7595255Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7595986Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7596701Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7596949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2022-12-01T10:42:33.7597182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2022-12-01T10:42:33.7597627Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.7598353Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7599116Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7599852Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7600588Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7601317Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7602049Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7602989Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7603824Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7604550Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7605276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7606006Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7606722Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7607431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7608223Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7608967Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7609691Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7610422Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7611151Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7611875Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7612662Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7613378Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7613778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.7614499Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7615230Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7615949Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7616672Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7617444Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7618184Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7618907Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7619639Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7620360Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7621084Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7621888Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7622610Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7623327Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7624060Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7624780Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7625508Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7626275Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7627013Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7627735Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7628462Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7629183Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7629422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2022-12-01T10:42:33.7629659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2022-12-01T10:42:33.7630106Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.7630507Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.7631235Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7631951Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7632691Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7633416Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7634138Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7634907Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7635160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2022-12-01T10:42:33.7635378Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2022-12-01T10:42:33.7635782Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.7636169Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.7636905Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7637644Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7638379Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7639100Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7639893Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7640649Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7640765Z dist init r=0, world=2 2022-12-01T10:42:33.7640868Z dist init r=1, world=2 2022-12-01T10:42:33.7640966Z ok (5.313s) 2022-12-01T10:42:33.7641311Z test_mixture_of_experts_offload_true_none_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16408 2022-12-01T10:42:33.7641534Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16409 2022-12-01T10:42:33.7641893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7642060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7642657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7642860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7643225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7643396Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7643773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7644031Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7644276Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7644522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7644926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7645312Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7645540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7645763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7646751Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7646861Z warnings.warn( 2022-12-01T10:42:33.7647830Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7648014Z warnings.warn( 2022-12-01T10:42:33.7648259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7648501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7648887Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7649282Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7650019Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7650748Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7650992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7651228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7651621Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7652015Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7652249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7652486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7652877Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7653651Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7654062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7654795Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7655037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7655272Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7655657Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7656387Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7656774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7657511Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7657815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7658056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7658447Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7658836Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7659557Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7660291Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7661025Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7661750Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7662535Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7663271Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7664010Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7664747Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7665466Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7666182Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7666971Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7667695Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7668423Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7669151Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7669879Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7670603Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7671378Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7672121Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7672833Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7673549Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7674278Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7674994Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7675785Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7676511Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7677231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7677960Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7678686Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7679398Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7680169Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7680898Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7681628Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7682349Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7683313Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7684028Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7684857Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7685570Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7686292Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7687008Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7687720Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7688426Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7689213Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7689958Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7690199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7690434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7690837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7691566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7691959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7692695Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7692932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7693211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7693608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7694336Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7694955Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7695069Z warnings.warn( 2022-12-01T10:42:33.7695464Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7696195Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7696820Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7696926Z warnings.warn( 2022-12-01T10:42:33.7697556Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7697670Z warnings.warn( 2022-12-01T10:42:33.7698341Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7698446Z warnings.warn( 2022-12-01T10:42:33.7699213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7699351Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7700111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7700249Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7700498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7700737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7701135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7701522Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7702262Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7703063Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7703796Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7704522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7705261Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7705990Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7706236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.7706457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.7706846Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7707296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7708330Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.7708530Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.7709552Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.7709755Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.7709995Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.7710236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.7710631Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7711024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7711765Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7712571Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7713291Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7714032Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7714757Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7715492Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7715719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.7715960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.7716408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7716816Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7717547Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7718276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7718997Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7719732Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7720450Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7721250Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7721487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.7721720Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.7722110Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7722796Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7723556Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7724287Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7725012Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7725829Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7726568Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7727292Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7727521Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2022-12-01T10:42:33.7727760Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2022-12-01T10:42:33.7728153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.7728547Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.7729282Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7730013Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7730879Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7731604Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7732322Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7733055Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7733297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2022-12-01T10:42:33.7733531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2022-12-01T10:42:33.7733928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.7734324Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.7735115Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7735866Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7736589Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7737321Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7738042Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7738773Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7739082Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2022-12-01T10:42:33.7739303Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2022-12-01T10:42:33.7739703Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.7740092Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.7740868Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7741608Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7742334Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7743062Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7743839Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7744581Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7744829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2022-12-01T10:42:33.7745060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2022-12-01T10:42:33.7745455Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.7745850Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.7746587Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7747318Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7748036Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7748845Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7749565Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7750298Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7751025Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7751754Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7752473Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7753254Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7753988Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7754716Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7755440Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7756164Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7756889Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7757677Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7758400Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7759110Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7759836Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7760561Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7761286Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7762060Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7763094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7763822Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7764538Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7765274Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7765990Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7766814Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7767530Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7768252Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7768983Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7769714Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7770431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7771219Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7771958Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7772684Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7773403Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7774128Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7774845Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7775629Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7776350Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7777079Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7777797Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7778513Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7778759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2022-12-01T10:42:33.7778999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2022-12-01T10:42:33.7779451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.7779857Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.7780589Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7781325Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7782052Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7782785Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7783509Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7784308Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7784554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2022-12-01T10:42:33.7784790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2022-12-01T10:42:33.7785172Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.7785565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.7786298Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7787038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7787759Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7788536Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7789275Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7790011Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7790123Z dist init r=0, world=2 2022-12-01T10:42:33.7790234Z dist init r=1, world=2 2022-12-01T10:42:33.7790334Z ok (5.414s) 2022-12-01T10:42:33.7790693Z test_mixture_of_experts_offload_true_shard_grad_op_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16623 2022-12-01T10:42:33.7790911Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16624 2022-12-01T10:42:33.7791284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7791441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7791821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7792007Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7792372Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7792597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7792984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7793172Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7793422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7793661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7794042Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7794438Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7794670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7794895Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7795872Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7795984Z warnings.warn( 2022-12-01T10:42:33.7796957Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7797068Z warnings.warn( 2022-12-01T10:42:33.7797359Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7797610Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7798012Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7798387Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7799136Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7799883Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7800125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7800364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7800758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7801145Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7801380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7801682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7802087Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7803053Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7803458Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7804181Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7804421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7804664Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7805058Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7805791Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7806179Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7806992Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7807247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7807477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7807870Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7808609Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7809346Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7810077Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7810800Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7811637Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7812370Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7813108Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7813844Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7814576Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7815296Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7816072Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7816804Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7817538Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7818268Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7819003Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7819724Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7820523Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7821246Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7821977Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7822709Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7823442Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7823822Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7824547Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7825340Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7826087Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7826813Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7827542Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7828274Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7828999Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7829795Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7830520Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7831247Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7831972Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7832701Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7833422Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7834197Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7834936Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7835663Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7836390Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7837114Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7837832Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7838620Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7839339Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7839582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7839825Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7840226Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7840988Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7841381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7842116Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7842356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7842893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7843298Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7844034Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7844658Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7844776Z warnings.warn( 2022-12-01T10:42:33.7845166Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7845909Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7846530Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7846635Z warnings.warn( 2022-12-01T10:42:33.7847264Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7847449Z warnings.warn( 2022-12-01T10:42:33.7848087Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.7848199Z warnings.warn( 2022-12-01T10:42:33.7848974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7849098Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7849856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7850003Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7850247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7850490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7850890Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7851634Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7852421Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7853175Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7853571Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7854306Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7855046Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7855766Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7856013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.7856248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.7856698Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7857722Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.7857928Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.7858327Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.7859349Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.7859553Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.7859798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.7860036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.7860431Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7860823Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.7861613Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7862372Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7863100Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7863846Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7864568Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7865308Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7865610Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.7865831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.7866238Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7866630Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.7867364Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7868099Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7868830Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7869564Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7870383Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7871143Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7871386Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.7871617Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.7872014Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7872404Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.7873139Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7873872Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7874594Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7875396Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7876123Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7876854Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7877106Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2022-12-01T10:42:33.7877344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2022-12-01T10:42:33.7877725Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.7878118Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.7878853Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7879638Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7880375Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7881109Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7881834Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7882742Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7882992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2022-12-01T10:42:33.7883227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2022-12-01T10:42:33.7883628Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.7884114Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.7884856Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7885597Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7886321Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7887061Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7887783Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7888516Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7888827Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2022-12-01T10:42:33.7889081Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2022-12-01T10:42:33.7889467Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.7889858Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.7890596Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7891334Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7892057Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7892785Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7893576Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7894316Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7894558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2022-12-01T10:42:33.7894794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2022-12-01T10:42:33.7895194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.7895585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.7896326Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7897063Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7897888Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7898635Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7899359Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7900105Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7900833Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7901561Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7902348Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7903081Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7903804Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7904528Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7905254Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7905982Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7906758Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7907503Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7908226Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7908956Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7909682Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7910407Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7911131Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7911944Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7912664Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7913384Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7914105Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7914841Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7915611Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7916360Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7917082Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7917809Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7918515Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7919243Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7919966Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7920755Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7921478Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7922210Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7923108Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7923841Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7924638Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7925386Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7926104Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7926838Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7927566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7928290Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7928536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2022-12-01T10:42:33.7928854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2022-12-01T10:42:33.7929264Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.7929662Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.7930396Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7931127Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7931852Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7932587Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7933310Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7934092Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7934348Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2022-12-01T10:42:33.7934584Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2022-12-01T10:42:33.7934985Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.7935377Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.7936104Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7936838Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7937557Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7938366Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7939091Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7939825Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7939943Z dist init r=0, world=2 2022-12-01T10:42:33.7940052Z dist init r=1, world=2 2022-12-01T10:42:33.7940152Z ok (5.413s) 2022-12-01T10:42:33.7940516Z test_mixture_of_experts_with_delay_before_free_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16838 2022-12-01T10:42:33.7940771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16839 2022-12-01T10:42:33.7941145Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7941323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7941684Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7941878Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7942239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.7942415Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.7942856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.7943058Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.7943308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.7943553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.7943954Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7944331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.7944565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.7944794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.7945775Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7945890Z warnings.warn( 2022-12-01T10:42:33.7946859Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.7947032Z warnings.warn( 2022-12-01T10:42:33.7947280Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.7947525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.7947927Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7948548Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7948641Z warnings.warn( 2022-12-01T10:42:33.7949038Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.7949661Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.7949773Z warnings.warn( 2022-12-01T10:42:33.7950542Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7950683Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7951442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.7951583Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.7951882Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.7952134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.7952540Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7952912Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.7953661Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7954397Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7955139Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7955865Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7956675Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7957406Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7957650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.7957894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.7958290Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7958691Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.7959435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7960159Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7960951Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7961699Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7962606Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7963351Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7963601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.7963843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.7964238Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7964611Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.7965348Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7966177Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7966917Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7967641Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7968377Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7969100Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7969838Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7970630Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7971386Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7972107Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7972845Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7973566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7974290Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7975072Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7975804Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7976525Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7977262Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7977986Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7978714Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7979485Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7980232Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7980953Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7981202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.7981447Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.7981845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7982240Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.7982975Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7983765Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7984497Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7985216Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7985952Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7986676Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7986917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.7987157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.7987536Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7987933Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.7988720Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7989461Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7990195Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7990923Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7991657Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7992376Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7992678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.7992926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.7993327Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7993719Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.7994453Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7995184Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7995919Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7996641Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7997421Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7998165Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.7998405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.7998643Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.7999027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.7999420Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8000155Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8000880Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8001617Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8002629Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8003373Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8004093Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8004343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.8004582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.8004984Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8005378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8006119Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8006923Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8007676Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8008399Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8009139Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8009864Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8010109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.8010347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.8010863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8011265Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8012009Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8012739Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8013484Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8014211Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8014941Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8015715Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8015974Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.8016212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.8016615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8017007Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8017743Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8018469Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8019203Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8019929Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8020724Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8021451Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8021695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.8021936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.8022335Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8022713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8023443Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8024166Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8024953Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8025694Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8026426Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8027155Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8027272Z dist init r=1, world=2 2022-12-01T10:42:33.8027381Z dist init r=0, world=2 2022-12-01T10:42:33.8027483Z ok (8.618s) 2022-12-01T10:42:33.8027840Z test_mixture_of_experts_with_delay_before_free_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17041 2022-12-01T10:42:33.8028061Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17042 2022-12-01T10:42:33.8028434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8028648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8029034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8029225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8029590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8029762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8030134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8030321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8030569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8030818Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8031207Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8031602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8031836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8032063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8033044Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8033164Z warnings.warn( 2022-12-01T10:42:33.8033458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.8034450Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8034561Z warnings.warn( 2022-12-01T10:42:33.8034802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.8035196Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8035576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8036206Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8036315Z warnings.warn( 2022-12-01T10:42:33.8036932Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8037040Z warnings.warn( 2022-12-01T10:42:33.8037669Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8037838Z warnings.warn( 2022-12-01T10:42:33.8038473Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8038584Z warnings.warn( 2022-12-01T10:42:33.8039351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8039491Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8040236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8040380Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8040666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.8041072Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8041310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.8041704Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8042612Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8043440Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8044206Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8044936Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8045682Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8046410Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8046652Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.8047046Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8047360Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.8047738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8048485Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8049216Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8049956Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8050687Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8051424Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8052198Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8052460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.8052702Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.8053105Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8053500Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8054237Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8054969Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8055702Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8056423Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8057214Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8057944Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8058684Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8059416Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8060149Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8060874Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8061650Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8062388Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8063118Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8063845Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8064579Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8065304Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8066108Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8066832Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8067557Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8068284Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8069014Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8069742Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8070040Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.8070296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.8070681Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8071074Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8071813Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8072545Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8073285Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8074009Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8074808Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8075532Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8075775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.8076018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.8076411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8076807Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8077541Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8078265Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8078996Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8079773Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8080519Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8081242Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8081491Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.8081732Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.8082113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8082672Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8083420Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8084242Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8084982Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8085705Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8086434Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8087155Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8087397Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.8087640Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.8088039Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8088493Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8089259Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8089990Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8090726Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8091452Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8092182Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8092970Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8093219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.8093456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.8093842Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8094241Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8094982Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8095718Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8096460Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8097183Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8097968Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8098709Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8098956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.8099197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.8099598Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8099992Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8100724Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8101452Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8102236Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8102958Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8103691Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8104420Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8104666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.8104901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.8105280Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8106008Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8106790Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8107529Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8107928Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8108655Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8109375Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8110109Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8110351Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.8110642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.8111044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8111434Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8112179Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8112909Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8113641Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8114374Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8115101Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8115885Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8116009Z dist init r=1, world=2 2022-12-01T10:42:33.8116118Z dist init r=0, world=2 2022-12-01T10:42:33.8116201Z ok (10.521s) 2022-12-01T10:42:33.8116574Z test_mixture_of_experts_with_delay_before_free_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17244 2022-12-01T10:42:33.8116794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17245 2022-12-01T10:42:33.8117173Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8117352Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8117735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8117928Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8118294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8118468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8118824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8119012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8119260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8119565Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8119971Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8120366Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8120592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8120820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8121798Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8121917Z warnings.warn( 2022-12-01T10:42:33.8123096Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8123218Z warnings.warn( 2022-12-01T10:42:33.8123462Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.8123705Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.8124106Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8124802Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8124927Z warnings.warn( 2022-12-01T10:42:33.8125331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8125953Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8126062Z warnings.warn( 2022-12-01T10:42:33.8126675Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8126791Z warnings.warn( 2022-12-01T10:42:33.8127423Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8127532Z warnings.warn( 2022-12-01T10:42:33.8128300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8128443Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8129207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8129439Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8129687Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.8129929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.8130333Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8130711Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8131459Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8132200Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8132923Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8133642Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8134435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8135181Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8135427Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.8135825Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8136064Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.8136452Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8137198Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8137927Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8138735Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8139462Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8140194Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8140963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8141209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.8141451Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.8141852Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8142228Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8142968Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8143752Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8144501Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8145224Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8145962Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8146692Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8147410Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8148194Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8148935Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8149663Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8150401Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8151127Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8151856Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8152677Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8153426Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8154150Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8154885Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8155608Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8156331Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8157111Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8157843Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8158559Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8158811Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.8159057Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.8159454Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8160182Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8160961Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8161707Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8162101Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8163047Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8163785Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8164515Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8164759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.8165002Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.8165478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8165875Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8166618Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8167361Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8168090Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8168825Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8169547Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8170341Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8170602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.8171005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8171247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.8171639Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8172375Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8173117Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8173842Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8174577Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8175368Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8176099Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8176342Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.8176585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.8176969Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8177364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8178099Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8178834Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8179614Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8180364Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8181088Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8181828Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8182076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.8182314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.8182713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8183108Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8183845Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8184643Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8185366Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8186098Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8186822Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8187558Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8187813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.8188054Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.8188501Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8188897Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8189632Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8190365Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8191098Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8191831Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8192551Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8193366Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8193614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.8193853Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.8194248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8194643Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8195383Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8196119Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8196841Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8197677Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8198423Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8199158Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8199405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.8199804Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8200047Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.8200427Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8201165Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8201898Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8202905Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8203659Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8204382Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8205123Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8205238Z dist init r=0, world=2 2022-12-01T10:42:33.8205345Z dist init r=1, world=2 2022-12-01T10:42:33.8205444Z ok (8.818s) 2022-12-01T10:42:33.8205806Z test_mixture_of_experts_with_delay_before_free_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17447 2022-12-01T10:42:33.8206028Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17448 2022-12-01T10:42:33.8206404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8206654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8207039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8207233Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8207597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8207769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8208138Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8208323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8208574Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8208816Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8209202Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8209598Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8209826Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8210054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8211034Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8211232Z warnings.warn( 2022-12-01T10:42:33.8212209Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8212322Z warnings.warn( 2022-12-01T10:42:33.8212563Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.8212808Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.8213210Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8213606Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8214338Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8215085Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8215328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.8215573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.8216016Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8216422Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8216662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.8216897Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.8217291Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8218041Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8218438Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8219175Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8219414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.8219637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.8220095Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8220834Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8221226Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8221962Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8222198Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.8222439Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.8222830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8223566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8224288Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8225078Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8225822Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8226558Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8227287Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8228026Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8228749Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8229550Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8230274Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8231004Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8231735Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8232467Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8233190Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8233971Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8234709Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8235439Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8236168Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8236895Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8237616Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8238414Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8238813Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8239526Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8240245Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8241026Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8241755Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8242692Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8243514Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8244269Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8244994Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8245731Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8246454Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8247185Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8247990Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8248717Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8249435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8250173Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8250897Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8251628Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8252403Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8253146Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8253872Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8254602Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8254849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.8255092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.8255493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8256221Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8256692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8257429Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8257668Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.8257904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.8258294Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8259030Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8259640Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8259754Z warnings.warn( 2022-12-01T10:42:33.8260146Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8260881Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8261546Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8261665Z warnings.warn( 2022-12-01T10:42:33.8262440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8262580Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8263342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8263488Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8263734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.8263976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.8264359Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8265101Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8265903Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8266644Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8267040Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8267778Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8268512Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8269234Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8269481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.8269718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.8270123Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8270906Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8271648Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8272366Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8272768Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8273484Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8274199Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8274996Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8275239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.8275472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.8275848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8276240Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8276983Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8277722Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8278441Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8279223Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8279971Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8280706Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8280949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.8281344Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8281589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.8281984Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8282896Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8283639Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8284458Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8285192Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8285911Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8286650Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8286894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.8287293Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8287518Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.8287915Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8288717Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8289472Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8290196Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8290931Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8291653Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8292389Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8292745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2022-12-01T10:42:33.8293152Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.8293394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2022-12-01T10:42:33.8293789Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.8294529Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8295260Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8295991Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8296724Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8297448Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8298231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8298489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2022-12-01T10:42:33.8298723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2022-12-01T10:42:33.8299107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.8299838Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8300564Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8301291Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8301682Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.8302480Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8303205Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8303935Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8304181Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2022-12-01T10:42:33.8304420Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2022-12-01T10:42:33.8304819Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.8305547Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8306268Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8307049Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8307461Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.8308186Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8308912Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8309633Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8309876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2022-12-01T10:42:33.8310109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2022-12-01T10:42:33.8310509Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.8311298Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8312030Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8312751Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8313488Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8314216Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8314956Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8315746Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8316505Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8317231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8317964Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8318691Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8319417Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8320217Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8320951Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8321674Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8322571Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8323315Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8324044Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8324836Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8325585Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8326306Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8327039Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8327758Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8328488Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8329292Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8330019Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8330737Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8331141Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.8331861Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8332577Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8333309Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8334075Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8334817Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8335538Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8336273Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8336994Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8337718Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8338509Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8339237Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8339956Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8340721Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8341447Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8342175Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8342945Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8343688Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8344412Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8345142Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8345862Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8346586Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8347368Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8348083Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8348801Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8349531Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8350248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8350977Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8351270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2022-12-01T10:42:33.8351520Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2022-12-01T10:42:33.8351924Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.8352651Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8353370Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8354094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8354489Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.8355216Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8356011Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8356735Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8356982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2022-12-01T10:42:33.8357218Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2022-12-01T10:42:33.8357611Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.8358345Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8359071Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8359789Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8360190Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.8360956Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8361687Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8362579Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8362713Z dist init r=0, world=2 2022-12-01T10:42:33.8362822Z dist init r=1, world=2 2022-12-01T10:42:33.8362905Z ok (11.424s) 2022-12-01T10:42:33.8363263Z test_mixture_of_experts_with_delay_before_free_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17662 2022-12-01T10:42:33.8363481Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17663 2022-12-01T10:42:33.8363858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8364031Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8364399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8364662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8365049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8365239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8365602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8365789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8366033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8366274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8366673Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8367071Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8367303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8367531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8368510Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8368624Z warnings.warn( 2022-12-01T10:42:33.8369661Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8369773Z warnings.warn( 2022-12-01T10:42:33.8370018Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.8370260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.8370663Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8371057Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8371804Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8372546Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8372789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.8373030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.8373422Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8373878Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8374104Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.8374341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.8374734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8375124Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8375870Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8376606Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8376848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.8377085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.8377477Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8377866Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8378658Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8379422Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8379664Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.8379890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.8380279Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8380674Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8381416Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8382153Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8382881Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8383707Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8384435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8385159Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8385884Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8386620Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8387346Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8388129Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8388870Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8389603Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8390326Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8391055Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8391774Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8392558Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8393277Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8394007Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8394733Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8395463Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8396181Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8396960Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8397693Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8398420Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8399145Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8399861Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8400576Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8401368Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8402090Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8402985Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8403720Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8404453Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8405170Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8405973Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8406710Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8407439Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8408164Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8408890Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8409608Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8410420Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8411147Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8411870Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8412118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.8412364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.8412763Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8413141Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8413871Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8414648Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8414904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.8415142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.8415541Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8415933Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8416668Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8417409Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8418033Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8418145Z warnings.warn( 2022-12-01T10:42:33.8418761Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8418929Z warnings.warn( 2022-12-01T10:42:33.8419553Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8419665Z warnings.warn( 2022-12-01T10:42:33.8420291Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8420400Z warnings.warn( 2022-12-01T10:42:33.8421164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8421306Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8422070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8422212Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8422458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.8422698Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.8423097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8423473Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8424276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8425035Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8425764Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8426507Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8427230Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8427968Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8428273Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.8428517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.8428921Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8429314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8430049Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8430786Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8431517Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8432250Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8433076Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8433832Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8434076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.8434311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.8434712Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8435092Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8435834Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8436572Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8437297Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8438087Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8438813Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8439544Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8439796Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.8440033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.8440428Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8440859Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8441599Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8442588Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8443351Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8444088Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8444817Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8445546Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8445789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.8446025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.8446422Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8446910Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8447650Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8448388Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8449117Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8449857Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8450577Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8451374Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8451634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2022-12-01T10:42:33.8451929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2022-12-01T10:42:33.8452373Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.8452808Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.8453692Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8454483Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8455257Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8456041Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8456869Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8457645Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8457923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2022-12-01T10:42:33.8458196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2022-12-01T10:42:33.8458582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.8459075Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.8459851Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8460683Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8461496Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8462269Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8463029Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8463808Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8464093Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2022-12-01T10:42:33.8464416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2022-12-01T10:42:33.8464857Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.8465292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.8466073Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8466914Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8467683Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8468457Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8469234Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8470094Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8470380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2022-12-01T10:42:33.8470663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2022-12-01T10:42:33.8471152Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.8471550Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.8472329Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8473111Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8473885Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8474659Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8475459Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8476296Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8477063Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8477833Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8478607Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8479380Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8480140Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8481007Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8481846Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8482801Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8483599Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8484370Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8485130Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8485994Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8486757Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8487524Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8488344Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8489138Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8489908Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8490743Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8491526Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8492346Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8493113Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8493895Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8494692Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8495540Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8496304Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8497070Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8497883Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8498657Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8499431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8500248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8501060Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8501824Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8502587Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8503401Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8504173Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8505002Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8505763Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8506527Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8506848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2022-12-01T10:42:33.8507129Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2022-12-01T10:42:33.8507573Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.8508020Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.8508743Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8509561Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8510342Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8511112Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8511879Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8512695Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8512986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2022-12-01T10:42:33.8513306Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2022-12-01T10:42:33.8513744Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.8514239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.8515015Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8515791Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8516554Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8517377Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8518145Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8518918Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8519073Z dist init r=1, world=2 2022-12-01T10:42:33.8519264Z dist init r=0, world=2 2022-12-01T10:42:33.8519411Z ok (11.322s) 2022-12-01T10:42:33.8519765Z test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17877 2022-12-01T10:42:33.8520032Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17878 2022-12-01T10:42:33.8520445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8520741Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8521164Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8521392Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8521797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8522008Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8522603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8522792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8523075Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8523358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8523848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8524392Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8524663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8524928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8525958Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8526115Z warnings.warn( 2022-12-01T10:42:33.8527131Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8527315Z warnings.warn( 2022-12-01T10:42:33.8527546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:42:33.8527866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:42:33.8528303Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8528735Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:42:33.8529594Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8530401Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8530682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:42:33.8530959Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:42:33.8531390Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8531865Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:42:33.8532156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-12-01T10:42:33.8532381Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-12-01T10:42:33.8532814Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8533244Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-12-01T10:42:33.8534024Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8534871Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8535195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1 2022-12-01T10:42:33.8535473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0 2022-12-01T10:42:33.8535952Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8536384Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 2 nodes. 2022-12-01T10:42:33.8537163Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8537955Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8538236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1 2022-12-01T10:42:33.8538513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0 2022-12-01T10:42:33.8538894Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8539322Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 2 nodes. 2022-12-01T10:42:33.8540163Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8541038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8541808Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8542576Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8543334Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8544107Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8544999Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8545776Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8546541Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8547358Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8548127Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8548895Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8549712Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8550495Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8551258Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8552032Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8552797Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8553603Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8554428Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8555210Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8556016Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8556793Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8557556Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8558325Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8559133Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8559945Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8560705Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8561479Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8562241Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8563253Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8564115Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8564887Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8565661Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8566484Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8567295Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8568067Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8568896Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8569679Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8570453Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8571229Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8571986Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8572788Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8573127Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1 2022-12-01T10:42:33.8573413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0 2022-12-01T10:42:33.8573859Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8574303Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 2 nodes. 2022-12-01T10:42:33.8575024Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8575794Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8576079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1 2022-12-01T10:42:33.8576353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0 2022-12-01T10:42:33.8576872Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8577311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 2 nodes. 2022-12-01T10:42:33.8578086Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8578923Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8579603Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8579754Z warnings.warn( 2022-12-01T10:42:33.8580416Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8580566Z warnings.warn( 2022-12-01T10:42:33.8581276Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8581375Z warnings.warn( 2022-12-01T10:42:33.8582054Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8582201Z warnings.warn( 2022-12-01T10:42:33.8583008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8583246Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8584052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8584272Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8584560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0 2022-12-01T10:42:33.8584889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1 2022-12-01T10:42:33.8585335Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8585768Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 2 nodes. 2022-12-01T10:42:33.8586504Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8587290Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8588168Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8589015Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8589807Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8590807Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8591161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0 2022-12-01T10:42:33.8591437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1 2022-12-01T10:42:33.8591883Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8592323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 2 nodes. 2022-12-01T10:42:33.8593100Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8593941Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8594777Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8595553Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8596363Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8597146Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8597430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1 2022-12-01T10:42:33.8597706Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0 2022-12-01T10:42:33.8598152Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8598588Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 2 nodes. 2022-12-01T10:42:33.8599361Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8600148Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8600912Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8601735Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8602673Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8603480Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8603897Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1 2022-12-01T10:42:33.8604174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0 2022-12-01T10:42:33.8604617Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8605054Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 2 nodes. 2022-12-01T10:42:33.8605836Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8606666Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8607453Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8608227Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8609054Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8609838Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8610124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0 2022-12-01T10:42:33.8610398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1 2022-12-01T10:42:33.8610844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8611324Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 2 nodes. 2022-12-01T10:42:33.8612114Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8612841Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8613650Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8614489Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8615262Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8616032Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8616333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0 2022-12-01T10:42:33.8630023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1 2022-12-01T10:42:33.8630477Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.8630882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 2 nodes. 2022-12-01T10:42:33.8631637Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8632498Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8633253Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8633999Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8634734Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8635476Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8635724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0 2022-12-01T10:42:33.8636121Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.8636429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1 2022-12-01T10:42:33.8636836Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 2 nodes. 2022-12-01T10:42:33.8637579Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8638317Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8639055Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8639781Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8640504Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8641368Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8641628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1 2022-12-01T10:42:33.8641849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0 2022-12-01T10:42:33.8642256Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.8642893Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 2 nodes. 2022-12-01T10:42:33.8643640Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8644389Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8645111Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8645833Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8646666Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8647394Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8647630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 0 2022-12-01T10:42:33.8647858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:17 to store for rank: 1 2022-12-01T10:42:33.8648249Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.8648634Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:17 with 2 nodes. 2022-12-01T10:42:33.8649363Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8650095Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8650887Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8651640Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8652363Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8653107Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8653831Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8654560Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8655351Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8656089Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8656812Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8657547Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8658272Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8659003Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8659776Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8660510Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8661227Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8661944Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8662668Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8663402Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8664190Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8664923Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8665647Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8666388Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8667114Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8667844Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8668617Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8669365Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8670085Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8670820Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8671542Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8672268Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8673051Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8673785Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8674504Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8675236Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8675958Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8676686Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8677456Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8678200Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8678920Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8679655Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8680374Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8681099Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8681404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 1 2022-12-01T10:42:33.8681649Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:18 to store for rank: 0 2022-12-01T10:42:33.8682054Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.8682623Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:18 with 2 nodes. 2022-12-01T10:42:33.8683378Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8684116Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8684847Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8685581Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8686393Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8687150Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8687398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 0 2022-12-01T10:42:33.8687638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:19 to store for rank: 1 2022-12-01T10:42:33.8688021Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.8688425Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:19 with 2 nodes. 2022-12-01T10:42:33.8689165Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8689901Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8690628Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8691444Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8692166Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8692899Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8693017Z dist init r=1, world=2 2022-12-01T10:42:33.8693126Z dist init r=0, world=2 2022-12-01T10:42:33.8693229Z ok (11.322s) 2022-12-01T10:42:33.8693593Z test_nested_always_wrap_model_offload_false_no_shard_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18092 2022-12-01T10:42:33.8693815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18093 2022-12-01T10:42:33.8694192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8694351Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8694721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8694895Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8695280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8695536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8695935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8696125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8696371Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8696597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8696996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8697393Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8697630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8697860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8698098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8698336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8699312Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8699489Z warnings.warn( 2022-12-01T10:42:33.8700470Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8700584Z warnings.warn( 2022-12-01T10:42:33.8701203Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8701298Z warnings.warn( 2022-12-01T10:42:33.8701914Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8702028Z warnings.warn( 2022-12-01T10:42:33.8702804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8702948Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8703713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8703853Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8704090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8704327Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8704614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8704844Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8705072Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8705299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8705528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8705757Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8705985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8706210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8706433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8706645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8706870Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8707098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8707318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8707538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8707761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8707985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8708270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8708493Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8708700Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8708930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8709044Z dist init r=1, world=2 2022-12-01T10:42:33.8709152Z dist init r=0, world=2 2022-12-01T10:42:33.8709253Z ok (4.511s) 2022-12-01T10:42:33.8709607Z test_nested_always_wrap_model_offload_false_none_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18175 2022-12-01T10:42:33.8709827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18176 2022-12-01T10:42:33.8710211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8710376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8710760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8710951Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8711313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8711486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8711854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8712041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8712285Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8712515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8712965Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8713381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8713612Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8713838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8714071Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8714306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8715287Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8715405Z warnings.warn( 2022-12-01T10:42:33.8716377Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8716488Z warnings.warn( 2022-12-01T10:42:33.8717090Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8717257Z warnings.warn( 2022-12-01T10:42:33.8717878Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8717989Z warnings.warn( 2022-12-01T10:42:33.8718619Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8718728Z warnings.warn( 2022-12-01T10:42:33.8719357Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8719469Z warnings.warn( 2022-12-01T10:42:33.8720241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8720383Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8721150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8721270Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8721506Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8721744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8721976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8722258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8722799Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8723034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8723261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8723488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8723698Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8723922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8724148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8724371Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8724601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8724827Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8725049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8725270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8725477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8725703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8725927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8726288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8726514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8726741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8726854Z dist init r=1, world=2 2022-12-01T10:42:33.8726965Z dist init r=0, world=2 2022-12-01T10:42:33.8727048Z ok (4.812s) 2022-12-01T10:42:33.8727420Z test_nested_always_wrap_model_offload_false_shard_grad_op_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18258 2022-12-01T10:42:33.8727639Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18259 2022-12-01T10:42:33.8728033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8728212Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8728597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8728792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8729159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8729312Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8729687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8729873Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8730119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8730364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8730771Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8731235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8731481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8731709Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8731927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8732161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8733151Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8733272Z warnings.warn( 2022-12-01T10:42:33.8734248Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8734359Z warnings.warn( 2022-12-01T10:42:33.8734977Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8735155Z warnings.warn( 2022-12-01T10:42:33.8735783Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8735894Z warnings.warn( 2022-12-01T10:42:33.8736524Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8736633Z warnings.warn( 2022-12-01T10:42:33.8737242Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8737356Z warnings.warn( 2022-12-01T10:42:33.8738127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8738268Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8739032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8739172Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8739406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8739642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8739876Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8740160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8740386Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8740611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8740879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8741108Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8741336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8741561Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8741782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8742015Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8742226Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8742450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8742677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8742901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8743122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8743347Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8743568Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8743854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8744076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8744285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8744399Z dist init r=1, world=2 2022-12-01T10:42:33.8744509Z dist init r=0, world=2 2022-12-01T10:42:33.8744612Z ok (4.812s) 2022-12-01T10:42:33.8744970Z test_nested_always_wrap_model_offload_true_no_shard_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18341 2022-12-01T10:42:33.8745186Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18342 2022-12-01T10:42:33.8745569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8745745Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8746112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8746306Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8746669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8746841Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8747213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8747399Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8747646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8747891Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8748279Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8748741Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8748982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8749210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8749445Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8749677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8750659Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8750782Z warnings.warn( 2022-12-01T10:42:33.8751762Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8751874Z warnings.warn( 2022-12-01T10:42:33.8752108Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8752323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8752556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8752851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8753083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8753308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8753537Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8753765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8753992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8754200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8754421Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8754648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8755281Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8755394Z warnings.warn( 2022-12-01T10:42:33.8756014Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8756121Z warnings.warn( 2022-12-01T10:42:33.8756888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8757033Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8757850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8758001Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8758225Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8758458Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8758688Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8758918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8759147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8759375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8759603Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8759831Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8760040Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8760264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8760487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8760710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8760934Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8761221Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8761447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8761673Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8761898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8762107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8762334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8762735Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8762964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8763187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8763306Z dist init r=0, world=2 2022-12-01T10:42:33.8763508Z dist init r=1, world=2 2022-12-01T10:42:33.8763594Z ok (4.812s) 2022-12-01T10:42:33.8763952Z test_nested_always_wrap_model_offload_true_none_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18424 2022-12-01T10:42:33.8764176Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18425 2022-12-01T10:42:33.8764556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8764735Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8765118Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8765309Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8765669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8765850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8766285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8766493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8766743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8766989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8767392Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8767855Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8768198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8768433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8768674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8768893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8769879Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8769996Z warnings.warn( 2022-12-01T10:42:33.8770968Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8771176Z warnings.warn( 2022-12-01T10:42:33.8771413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8771643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8771874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8772104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8772333Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8772543Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8772768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8772999Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8773225Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8773449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8773668Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8773893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8774525Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8774641Z warnings.warn( 2022-12-01T10:42:33.8775291Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8775411Z warnings.warn( 2022-12-01T10:42:33.8776049Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8776160Z warnings.warn( 2022-12-01T10:42:33.8776786Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8776901Z warnings.warn( 2022-12-01T10:42:33.8777671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8777813Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8778572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8778711Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8778945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8779160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8779455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8779692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8779921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8780145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8780370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8780598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8780822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8781031Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8781253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8781483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8781713Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8781940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8782163Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8782383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8782605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8782834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8783043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8783269Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8783489Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8783760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8783885Z dist init r=1, world=2 2022-12-01T10:42:33.8783995Z dist init r=0, world=2 2022-12-01T10:42:33.8784096Z ok (5.112s) 2022-12-01T10:42:33.8784444Z test_nested_always_wrap_model_offload_true_shard_grad_op_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18507 2022-12-01T10:42:33.8784663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18508 2022-12-01T10:42:33.8785043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8785219Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8785602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8785796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8786166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8786340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8786718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8786889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8787135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8787381Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8787844Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8788241Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8788471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8788700Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8788936Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8789168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8790147Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8790252Z warnings.warn( 2022-12-01T10:42:33.8791228Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8791339Z warnings.warn( 2022-12-01T10:42:33.8791574Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8791806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8792043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8792273Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8792549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8792794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8793020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8793230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8793456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8793681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8793907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8794137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8794768Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8794882Z warnings.warn( 2022-12-01T10:42:33.8795502Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8795610Z warnings.warn( 2022-12-01T10:42:33.8796222Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8796388Z warnings.warn( 2022-12-01T10:42:33.8797020Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8797128Z warnings.warn( 2022-12-01T10:42:33.8797948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8798089Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8798849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8798992Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8799231Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8799466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8799679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8799909Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8800135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8800360Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8800583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8800810Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8801038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8801318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8801538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8801765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8801992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8802217Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8802612Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8802843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8803069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8803301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8803528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8803737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8803958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8804181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8804293Z dist init r=0, world=2 2022-12-01T10:42:33.8804401Z dist init r=1, world=2 2022-12-01T10:42:33.8804499Z ok (5.112s) 2022-12-01T10:42:33.8804839Z test_nested_wrapped_model_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18590 2022-12-01T10:42:33.8805057Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18591 2022-12-01T10:42:33.8805521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8805702Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8806086Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8806278Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8806646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8806821Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8807193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8807381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8807616Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8807864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8808270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8808668Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8808898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8809128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8809363Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8809592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8810655Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8810784Z warnings.warn( 2022-12-01T10:42:33.8811764Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8811861Z warnings.warn( 2022-12-01T10:42:33.8812482Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8812593Z warnings.warn( 2022-12-01T10:42:33.8813213Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8813322Z warnings.warn( 2022-12-01T10:42:33.8814085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8814286Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8815052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8815191Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8815431Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8815649Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8815883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8816110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8816337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8816564Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8816789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8817017Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8817776Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8818511Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8819311Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8820060Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8820806Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8821549Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8822290Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8823022Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8823816Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8824543Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8825275Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8826007Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8826739Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8827465Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8827705Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8827923Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8828202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8828450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8828682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8828909Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8829136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8829358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8829581Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8829798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8830028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8830253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8830474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8830697Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8830809Z dist init r=0, world=2 2022-12-01T10:42:33.8830917Z dist init r=1, world=2 2022-12-01T10:42:33.8831016Z ok (4.411s) 2022-12-01T10:42:33.8831333Z test_nested_wrapped_model_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18673 2022-12-01T10:42:33.8831551Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18674 2022-12-01T10:42:33.8831991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8832173Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8832553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8832750Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8833114Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8833288Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8833640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8833826Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8834075Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8834319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8834721Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8835117Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8835346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8835572Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8835804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8836014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8837045Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8837177Z warnings.warn( 2022-12-01T10:42:33.8838158Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8838272Z warnings.warn( 2022-12-01T10:42:33.8838895Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8839006Z warnings.warn( 2022-12-01T10:42:33.8839622Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8839732Z warnings.warn( 2022-12-01T10:42:33.8840360Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8840469Z warnings.warn( 2022-12-01T10:42:33.8841141Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8841296Z warnings.warn( 2022-12-01T10:42:33.8842077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8842221Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8843232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8843378Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8843620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8843853Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8844090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8844323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8844551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8844760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8844985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8845214Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8846051Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8846810Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8847557Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8848289Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8849032Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8849761Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8850500Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8851311Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8852044Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8852767Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8853496Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8854215Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8854951Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8855725Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8856473Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8857194Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8857439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8857674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8857901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8858132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8858345Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8858575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8858804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8859088Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8859319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8859549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8859777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8860000Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8860205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8860430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8860543Z dist init r=1, world=2 2022-12-01T10:42:33.8860651Z dist init r=0, world=2 2022-12-01T10:42:33.8860751Z ok (4.611s) 2022-12-01T10:42:33.8861094Z test_nested_wrapped_model_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18756 2022-12-01T10:42:33.8861320Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18757 2022-12-01T10:42:33.8861704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8861862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8862245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8862438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8862805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8862975Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8863350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8863537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8863879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8864299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8864522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8864913Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8865142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8865369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8865612Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8865843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8866822Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8866938Z warnings.warn( 2022-12-01T10:42:33.8867912Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8868098Z warnings.warn( 2022-12-01T10:42:33.8868730Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8868825Z warnings.warn( 2022-12-01T10:42:33.8869447Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8869557Z warnings.warn( 2022-12-01T10:42:33.8870185Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8870297Z warnings.warn( 2022-12-01T10:42:33.8870925Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8871033Z warnings.warn( 2022-12-01T10:42:33.8871797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8871938Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8872701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8872907Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8873141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8873378Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8873611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8873842Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8874073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8874297Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8874524Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8874758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8875519Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8876255Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8876996Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8877773Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8878523Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8879261Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8880010Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8880741Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8881462Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8882236Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8883241Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8883973Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8884718Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8885444Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8886182Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8887007Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8887248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8887486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8887719Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8887954Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8888187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8888419Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8888651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8888857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8889085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8889311Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8889534Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8889756Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8889975Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8890199Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8890316Z dist init r=0, world=2 2022-12-01T10:42:33.8890407Z dist init r=1, world=2 2022-12-01T10:42:33.8890506Z ok (4.611s) 2022-12-01T10:42:33.8890909Z test_nested_wrapped_model_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18839 2022-12-01T10:42:33.8891146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18840 2022-12-01T10:42:33.8891527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8891703Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8892085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8892276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8892647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8892803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8893178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8893367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8893613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8893856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8894262Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8894663Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8894957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8895172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8895409Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8895637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8896622Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8896739Z warnings.warn( 2022-12-01T10:42:33.8897717Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8897832Z warnings.warn( 2022-12-01T10:42:33.8898060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8898294Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8899042Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8899827Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8900076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8900307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8900517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8900745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8901482Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8902217Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8902937Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8903655Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8904460Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8905189Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8905927Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8906662Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8906898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8907133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8907862Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8908645Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8908892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8909126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8909355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8909569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8910306Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8911038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8911777Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8912502Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8913311Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8914042Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8914775Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8915510Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8916241Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8916963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8917746Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8918487Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8919221Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8919948Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8920682Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8921406Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8922203Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8923233Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8923975Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8924708Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8925440Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8926164Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8926968Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8927709Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8928441Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8929168Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8929898Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8930616Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8931429Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8932155Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8932882Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8933606Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8934232Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8934348Z warnings.warn( 2022-12-01T10:42:33.8934969Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8935082Z warnings.warn( 2022-12-01T10:42:33.8935912Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8936069Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8936836Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8936958Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8937201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8937432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8937672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8937906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8938135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8938360Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8938579Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8938805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8939015Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8939242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8939530Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8939753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8939982Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8940206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8940434Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8940655Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8940895Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8941123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8941353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8941583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8941805Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8942028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8943025Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:925: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.8943157Z return iter(self.unbind(0)) 2022-12-01T10:42:33.8944191Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:925: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.8944333Z return iter(self.unbind(0)) 2022-12-01T10:42:33.8944445Z dist init r=1, world=2 2022-12-01T10:42:33.8944538Z dist init r=0, world=2 2022-12-01T10:42:33.8944638Z ok (4.712s) 2022-12-01T10:42:33.8944970Z test_nested_wrapped_model_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18922 2022-12-01T10:42:33.8945189Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18923 2022-12-01T10:42:33.8945563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8945739Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8946122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8946317Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8946663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.8946835Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.8947210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.8947398Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.8947642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.8947888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.8948353Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8948754Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.8948967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.8949195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.8949432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8949664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8950643Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8950763Z warnings.warn( 2022-12-01T10:42:33.8951736Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.8951846Z warnings.warn( 2022-12-01T10:42:33.8952080Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8952314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8953119Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8953868Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8954102Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8954317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8954542Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8954772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8955519Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8956249Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8956981Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8957771Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8958514Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8959242Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8959984Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8960712Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8960948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8961182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8961963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8962941Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8963187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8963422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8963634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8963870Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8964611Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8965336Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8966078Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8966906Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8967650Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8968376Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8969105Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8969827Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8970561Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8971350Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8972100Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8972822Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8973560Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8974286Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8975015Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8975800Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8976533Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8977255Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8977989Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8978710Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8979436Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8980206Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8980947Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8981671Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8982405Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8983127Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8983854Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8984637Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8985360Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8986081Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8986819Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8987542Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8988269Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8989039Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8989782Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8990505Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.8991135Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8991249Z warnings.warn( 2022-12-01T10:42:33.8991873Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.8991985Z warnings.warn( 2022-12-01T10:42:33.8992616Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8992708Z warnings.warn( 2022-12-01T10:42:33.8993411Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.8993524Z warnings.warn( 2022-12-01T10:42:33.8994292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8994434Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8995192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.8995335Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.8995572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8995802Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8996035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8996249Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8996483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8996710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8996932Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8997157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8997389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8997615Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8997904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8998122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8998351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8998578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8998803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8999028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8999251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8999486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8999714Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.8999940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9000145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9000366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9000479Z dist init r=0, world=2 2022-12-01T10:42:33.9000586Z dist init r=1, world=2 2022-12-01T10:42:33.9000686Z ok (4.912s) 2022-12-01T10:42:33.9001031Z test_nested_wrapped_model_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19005 2022-12-01T10:42:33.9001253Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19006 2022-12-01T10:42:33.9001743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9001905Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9002288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9002783Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9003162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9003337Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9003708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9003896Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9004149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9004378Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9004785Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9005183Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9005413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9005641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9005880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9006116Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9007179Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9007310Z warnings.warn( 2022-12-01T10:42:33.9008287Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9008398Z warnings.warn( 2022-12-01T10:42:33.9008618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9008845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9009598Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9010345Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9010580Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9010892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9011125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9011359Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9012102Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9012826Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9013575Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9014302Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9015030Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9015805Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9016562Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9017292Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9017532Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9017766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9018485Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9019213Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9019444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9019739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9019968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9020200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9020934Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9021657Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9022396Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9023120Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9023859Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9024633Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9025387Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9026112Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9026849Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9027572Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9028303Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9029091Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9029812Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9030527Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9031269Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9031994Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9032726Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9033493Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9034237Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9034957Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9035694Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9036412Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9037142Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9037936Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9038671Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9039392Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9040130Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9040892Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9041625Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9042576Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9043346Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9044071Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9044792Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9045516Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9046243Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9046971Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9047696Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9047793Z warnings.warn( 2022-12-01T10:42:33.9048419Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9048531Z warnings.warn( 2022-12-01T10:42:33.9049166Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9049280Z warnings.warn( 2022-12-01T10:42:33.9049911Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9050020Z warnings.warn( 2022-12-01T10:42:33.9050782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9050922Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9051752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9051908Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9052130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9052361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9052594Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9052827Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9053058Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9053284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9053509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9053735Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9053948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9054178Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9054401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9054622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9054845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9055071Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9055297Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9055582Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9055806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9056016Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9056242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9056468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9056691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9056911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9057024Z dist init r=1, world=2 2022-12-01T10:42:33.9057134Z dist init r=0, world=2 2022-12-01T10:42:33.9057217Z ok (4.912s) 2022-12-01T10:42:33.9057598Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19088 2022-12-01T10:42:33.9057825Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19089 2022-12-01T10:42:33.9058211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9058388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9058767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9058960Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9059322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9059495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9059859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9060110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9060369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9060614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9061018Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9061414Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9061644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9061875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9062865Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9062979Z warnings.warn( 2022-12-01T10:42:33.9063955Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9064106Z warnings.warn( 2022-12-01T10:42:33.9064736Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9064847Z warnings.warn( 2022-12-01T10:42:33.9065462Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9065570Z warnings.warn( 2022-12-01T10:42:33.9066330Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9066470Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9067237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9067376Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9068128Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9068870Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9069652Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9070414Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9071143Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9071889Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9072618Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9073354Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9074145Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9074875Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9075597Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9076338Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9077063Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9077799Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9078572Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9079320Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9080044Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9080783Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9081505Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9082235Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9082598Z dist init r=1, world=2 2022-12-01T10:42:33.9082724Z dist init r=0, world=2 2022-12-01T10:42:33.9082827Z ok (4.313s) 2022-12-01T10:42:33.9083210Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19171 2022-12-01T10:42:33.9083434Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19172 2022-12-01T10:42:33.9083816Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9083994Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9084374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9084550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9084920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9085093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9085469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9085656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9085904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9086153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9086554Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9086949Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9087167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9087473Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9088474Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9088588Z warnings.warn( 2022-12-01T10:42:33.9089566Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9089682Z warnings.warn( 2022-12-01T10:42:33.9090300Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9090409Z warnings.warn( 2022-12-01T10:42:33.9091025Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9091132Z warnings.warn( 2022-12-01T10:42:33.9091763Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9091970Z warnings.warn( 2022-12-01T10:42:33.9092591Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9092701Z warnings.warn( 2022-12-01T10:42:33.9093464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9093601Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9094362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9094507Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9095504Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9095736Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.9096783Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9097028Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.9097143Z dist init r=0, world=2 2022-12-01T10:42:33.9097252Z dist init r=1, world=2 2022-12-01T10:42:33.9097336Z ok (4.613s) 2022-12-01T10:42:33.9097774Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19254 2022-12-01T10:42:33.9097996Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19255 2022-12-01T10:42:33.9098374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9098554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9098936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9099129Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9099495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9099669Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9100024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9100213Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9100458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9100765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9101175Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9101575Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9101806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9102032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9103009Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9103131Z warnings.warn( 2022-12-01T10:42:33.9104088Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9104203Z warnings.warn( 2022-12-01T10:42:33.9104821Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9104929Z warnings.warn( 2022-12-01T10:42:33.9105545Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9105705Z warnings.warn( 2022-12-01T10:42:33.9106356Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9106466Z warnings.warn( 2022-12-01T10:42:33.9107096Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9107205Z warnings.warn( 2022-12-01T10:42:33.9107973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9108105Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9108870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9109010Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9110007Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9110296Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.9111296Z /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9111527Z Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2022-12-01T10:42:33.9111641Z dist init r=1, world=2 2022-12-01T10:42:33.9111751Z dist init r=0, world=2 2022-12-01T10:42:33.9111851Z ok (4.513s) 2022-12-01T10:42:33.9112228Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19337 2022-12-01T10:42:33.9112454Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19338 2022-12-01T10:42:33.9112809Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9112985Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9113367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9113560Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9113928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9114098Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9114476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9114663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9114941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9115199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9115604Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9116000Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9116230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9116457Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9117436Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9117551Z warnings.warn( 2022-12-01T10:42:33.9118524Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9118635Z warnings.warn( 2022-12-01T10:42:33.9119451Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9120197Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9120932Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9121678Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9122627Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9123390Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9124229Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9124994Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9125722Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9126467Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9127194Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9127929Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9128740Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9129481Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9130208Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9130946Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9131671Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9132386Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9133161Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9133909Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9134636Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9135369Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9136091Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9136825Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9137610Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9138344Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9139072Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9139809Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9140539Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9141305Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9142125Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9142879Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9143603Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9144337Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9145063Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9145795Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9146522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9147315Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9148038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9148772Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9149495Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9150218Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9150985Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9151728Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9152447Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9153183Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9153814Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9153928Z warnings.warn( 2022-12-01T10:42:33.9154545Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9154653Z warnings.warn( 2022-12-01T10:42:33.9155423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9155629Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9156398Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9156538Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9156633Z dist init r=0, world=2 2022-12-01T10:42:33.9156743Z dist init r=1, world=2 2022-12-01T10:42:33.9156844Z ok (4.512s) 2022-12-01T10:42:33.9157220Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19420 2022-12-01T10:42:33.9157443Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19421 2022-12-01T10:42:33.9157821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9158000Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9158380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9158552Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9158916Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9159090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9159458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9159648Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9159896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9160191Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9160616Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9161016Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9161226Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9161454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9162647Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9162782Z warnings.warn( 2022-12-01T10:42:33.9163767Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9163881Z warnings.warn( 2022-12-01T10:42:33.9164626Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9165454Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9166177Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9166920Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9167656Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9168399Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9169125Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9169927Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9170673Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9171406Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9172132Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9172854Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9173570Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9174368Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9175096Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9175829Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9176556Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9177290Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9178010Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9178792Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9179526Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9180258Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9180983Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9181715Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9182435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9183231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9183958Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9184691Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9185420Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9186152Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9186874Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9187660Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9188398Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9189129Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9189856Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9190589Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9191310Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9192103Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9192825Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9193554Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9194281Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9195010Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9195733Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9196514Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9197248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9197977Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9198606Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9198702Z warnings.warn( 2022-12-01T10:42:33.9199324Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9199434Z warnings.warn( 2022-12-01T10:42:33.9200069Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9200235Z warnings.warn( 2022-12-01T10:42:33.9200873Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9200984Z warnings.warn( 2022-12-01T10:42:33.9201751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9201890Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9202875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9203028Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9203766Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9204511Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9204625Z dist init r=0, world=2 2022-12-01T10:42:33.9204734Z dist init r=1, world=2 2022-12-01T10:42:33.9204834Z ok (4.612s) 2022-12-01T10:42:33.9205218Z test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19503 2022-12-01T10:42:33.9205518Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19504 2022-12-01T10:42:33.9205913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9206091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9206479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9206654Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9207019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9207191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9207566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9207758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9208007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9208252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9208654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9209049Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9209261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9209488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9210555Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9210674Z warnings.warn( 2022-12-01T10:42:33.9211648Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9211762Z warnings.warn( 2022-12-01T10:42:33.9212508Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9213253Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9213977Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9214763Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9215504Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9216243Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9216972Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9217702Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9218425Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9219234Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9219963Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9220697Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9221427Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9222157Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9222880Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9223659Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9224402Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9225119Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9225847Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9226578Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9227303Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9228097Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9228824Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9229554Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9230281Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9231011Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9231732Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9232513Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9233248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9233981Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9234706Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9235436Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9236152Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9236946Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9237674Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9238403Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9239130Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9239856Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9240575Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9241387Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9242130Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9243084Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9243816Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9244551Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9245273Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9246006Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9246732Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9246829Z warnings.warn( 2022-12-01T10:42:33.9247451Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9247560Z warnings.warn( 2022-12-01T10:42:33.9248198Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9248316Z warnings.warn( 2022-12-01T10:42:33.9248942Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9249052Z warnings.warn( 2022-12-01T10:42:33.9249824Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9249963Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9250805Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9250961Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9251695Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9252431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9252546Z dist init r=1, world=2 2022-12-01T10:42:33.9252661Z dist init r=0, world=2 2022-12-01T10:42:33.9252760Z ok (4.612s) 2022-12-01T10:42:33.9253102Z test_transformer_offload_false_no_shard_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19586 2022-12-01T10:42:33.9253322Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19587 2022-12-01T10:42:33.9253695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9253869Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9254232Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9254426Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9254790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9255026Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9255412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9255604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9255854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9256101Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9256505Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9256885Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9257118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9257350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9257589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9257825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9258801Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9258916Z warnings.warn( 2022-12-01T10:42:33.9259938Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9260065Z warnings.warn( 2022-12-01T10:42:33.9260693Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9260803Z warnings.warn( 2022-12-01T10:42:33.9261419Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9261511Z warnings.warn( 2022-12-01T10:42:33.9262285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9262427Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9263187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9263325Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9263565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9263800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9264096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9264328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9264542Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9264769Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9265530Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9266276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9267012Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9267757Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9268484Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9269276Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9270016Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9270749Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9270987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9271226Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9271462Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9271693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9271920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9272142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9273170Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9273420Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.9274443Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py:965: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9274645Z _ext_post_unflatten_transform(subtensor.view(shape), param_extension) 2022-12-01T10:42:33.9274882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9275121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9275357Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9275590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9275817Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9276038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9276264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9276473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9276702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9276927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9277042Z dist init r=1, world=2 2022-12-01T10:42:33.9277150Z dist init r=0, world=2 2022-12-01T10:42:33.9277250Z ok (6.014s) 2022-12-01T10:42:33.9277639Z test_transformer_offload_false_none_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19669 2022-12-01T10:42:33.9277872Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19670 2022-12-01T10:42:33.9278233Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9278408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9278790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9278984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9279350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9279524Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9279898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9280090Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9280338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9280568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9280969Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9281362Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9281701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9281933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9282170Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9282625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9283631Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9283745Z warnings.warn( 2022-12-01T10:42:33.9284728Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9284846Z warnings.warn( 2022-12-01T10:42:33.9285449Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9285559Z warnings.warn( 2022-12-01T10:42:33.9286174Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9286287Z warnings.warn( 2022-12-01T10:42:33.9286994Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9287118Z warnings.warn( 2022-12-01T10:42:33.9287752Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9287862Z warnings.warn( 2022-12-01T10:42:33.9288633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9288776Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9289521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9289665Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9289903Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9290141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9290374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9290604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9290826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9291130Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9292200Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:586: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9292340Z (rank, world_num_valid_indices[rank]) 2022-12-01T10:42:33.9293393Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:608: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9293508Z world_indices[ 2022-12-01T10:42:33.9293728Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9293958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9294187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9294415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9294642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9294866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9295670Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9296437Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9297172Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9297919Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9298646Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9299366Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9300084Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9300888Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9301122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9301337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9301569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9301798Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9302030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9302258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9303005Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9303744Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9304519Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9305275Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9306003Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9306726Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9307446Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9308184Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9308415Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9308703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9308940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9309155Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9309272Z dist init r=0, world=2 2022-12-01T10:42:33.9309381Z dist init r=1, world=2 2022-12-01T10:42:33.9309482Z ok (6.114s) 2022-12-01T10:42:33.9309834Z test_transformer_offload_false_shard_grad_op_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19752 2022-12-01T10:42:33.9310054Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19753 2022-12-01T10:42:33.9310434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9310615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9310976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9311171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9311535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9311709Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9312082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9312273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9312519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9312763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9313169Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9313609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9313853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9314081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9314316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9314550Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9315541Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9315660Z warnings.warn( 2022-12-01T10:42:33.9316632Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9316742Z warnings.warn( 2022-12-01T10:42:33.9317358Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9317528Z warnings.warn( 2022-12-01T10:42:33.9318135Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9318246Z warnings.warn( 2022-12-01T10:42:33.9318880Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9318991Z warnings.warn( 2022-12-01T10:42:33.9319619Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9319730Z warnings.warn( 2022-12-01T10:42:33.9320495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9320637Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9321404Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9321542Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9321778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9321996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9322232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9322867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9323126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9323352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9324429Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:586: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9324567Z (rank, world_num_valid_indices[rank]) 2022-12-01T10:42:33.9325625Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:608: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9325735Z world_indices[ 2022-12-01T10:42:33.9325968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9326200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9326412Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9326639Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9326952Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9327182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9327939Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9328681Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9329414Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9330161Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9330887Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9331675Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9332423Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9333162Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9333400Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9333638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9333873Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9334086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9334316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9334536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9335277Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9336005Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9336790Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9337530Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9338257Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9338998Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9339720Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9340498Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9340746Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9341014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9341248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9341479Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9341591Z dist init r=1, world=2 2022-12-01T10:42:33.9341683Z dist init r=0, world=2 2022-12-01T10:42:33.9341783Z ok (6.314s) 2022-12-01T10:42:33.9342122Z test_transformer_offload_true_no_shard_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19835 2022-12-01T10:42:33.9342344Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19836 2022-12-01T10:42:33.9342728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9342904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9343282Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9343472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9343818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9343988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9344360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9344611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9344858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9345105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9345507Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9345901Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9346131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9346342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9346575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9346812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9347791Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9347907Z warnings.warn( 2022-12-01T10:42:33.9348874Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9348984Z warnings.warn( 2022-12-01T10:42:33.9349780Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9350543Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9351273Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9352007Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9352729Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9353467Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9354256Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9355003Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9355733Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9356473Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9356714Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9356949Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9357684Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9358470Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9358716Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9358932Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9359672Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9360409Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9360648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9360878Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9361614Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9362337Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9362855Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9363091Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9363835Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9364570Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9365302Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9366035Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9366758Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9367569Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9368312Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9369050Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9369777Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9370506Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9371226Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9372040Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9372767Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9373492Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9374217Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9374950Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9375674Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9376467Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9377208Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9377935Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9378660Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9379389Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9380109Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9380908Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9381632Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9382363Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9383083Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9383813Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9384531Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9385312Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9386053Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9386765Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9387491Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9388226Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9388461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9388692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9389423Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9390207Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9390444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9390675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9391300Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9391414Z warnings.warn( 2022-12-01T10:42:33.9392035Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9392145Z warnings.warn( 2022-12-01T10:42:33.9392911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9393037Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9393801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9393944Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9394246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9394491Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9395240Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9395972Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9396716Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9397487Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9398231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9399023Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9399759Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9400484Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9401220Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9401944Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9402889Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9403700Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9404453Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9405181Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9405920Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9406644Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9406881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9407117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9407346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9407638Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9407869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9408099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9408842Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9409566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9410304Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9411024Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9411754Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9412522Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9413272Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9414000Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9414736Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9415456Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9416182Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9416960Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9417198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9417430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9417655Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9417885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9418098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9418326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9419072Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9419803Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9420538Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9421363Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9422110Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9422831Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9423567Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9424295Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9425024Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9425803Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9426534Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9427255Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9427494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9427731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9427957Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9428187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9428416Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9428641Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9429360Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9430138Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9430885Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9431608Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9432344Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9433070Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9433797Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9434579Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9435315Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9436038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9436774Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9437492Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9437726Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9437959Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9438072Z dist init r=1, world=2 2022-12-01T10:42:33.9438176Z dist init r=0, world=2 2022-12-01T10:42:33.9438277Z ok (6.514s) 2022-12-01T10:42:33.9438674Z test_transformer_offload_true_none_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19918 2022-12-01T10:42:33.9438907Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19919 2022-12-01T10:42:33.9439267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9439443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9439823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9440015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9440378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9440557Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9440966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9441158Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9441390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9441634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9442033Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9442637Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9442878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9443193Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9443429Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9443665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9444658Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9444770Z warnings.warn( 2022-12-01T10:42:33.9445747Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9445845Z warnings.warn( 2022-12-01T10:42:33.9446081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9446312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9447051Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9447781Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9448079Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9448323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9449067Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9449804Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9450043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9450278Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9451018Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9451745Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9452039Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9452252Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9452995Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9453733Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9454457Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9455194Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9455918Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9456655Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9457435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9458185Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9458908Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9459643Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9460369Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9461099Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9461883Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9462617Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9463339Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9464074Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9464795Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9465524Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9466296Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9467034Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9467751Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9468489Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9469210Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9469938Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9470727Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9471459Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9471696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9471931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9472668Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9473388Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9473621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9473850Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9474528Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9474650Z warnings.warn( 2022-12-01T10:42:33.9475260Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9475371Z warnings.warn( 2022-12-01T10:42:33.9476005Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9476113Z warnings.warn( 2022-12-01T10:42:33.9476740Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9476850Z warnings.warn( 2022-12-01T10:42:33.9477827Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:925: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9477955Z return iter(self.unbind(0)) 2022-12-01T10:42:33.9478929Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:925: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9479115Z return iter(self.unbind(0)) 2022-12-01T10:42:33.9479892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9480036Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9480780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9480920Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9481161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9481392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9481628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9481862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9482833Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9483579Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9483815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9484126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9484369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9484582Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9484814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9485041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9485266Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9485490Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9486248Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9486974Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9487202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9487431Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9487656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9487944Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9488682Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9489411Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9489640Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9489867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9490093Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9490323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9491052Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9491770Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9491998Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9492227Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9492339Z dist init r=1, world=2 2022-12-01T10:42:33.9492480Z dist init r=0, world=2 2022-12-01T10:42:33.9492592Z ok (7.016s) 2022-12-01T10:42:33.9492947Z test_transformer_offload_true_shard_grad_op_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20001 2022-12-01T10:42:33.9493164Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20002 2022-12-01T10:42:33.9493536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9493712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9494093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9494287Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9494631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:42:33.9494809Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:42:33.9495179Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:42:33.9495365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:42:33.9495612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:42:33.9495854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:42:33.9496254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9496648Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:42:33.9496942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:42:33.9497153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:42:33.9497387Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9497620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9498597Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9498716Z warnings.warn( 2022-12-01T10:42:33.9499688Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:42:33.9499802Z warnings.warn( 2022-12-01T10:42:33.9500038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9500269Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9501022Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9501826Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9502073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9502288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9503036Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9503787Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9504024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9504255Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9504995Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9505726Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9506016Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9506251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9506991Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9507727Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9508459Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9509195Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9509919Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9510701Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9511440Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9512173Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9512897Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9513628Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9514345Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9515140Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9515865Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9516595Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9517319Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9518046Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9518769Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9519551Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9520289Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9521018Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9521747Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9522686Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9523428Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9524245Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9524969Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9525698Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9525939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9526161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9526898Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9527623Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9527862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9528157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9528799Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9528911Z warnings.warn( 2022-12-01T10:42:33.9529530Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:42:33.9529635Z warnings.warn( 2022-12-01T10:42:33.9530266Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9530379Z warnings.warn( 2022-12-01T10:42:33.9531009Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:42:33.9531102Z warnings.warn( 2022-12-01T10:42:33.9532080Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:925: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9532207Z return iter(self.unbind(0)) 2022-12-01T10:42:33.9533246Z /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:925: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:42:33.9533376Z return iter(self.unbind(0)) 2022-12-01T10:42:33.9534147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9534289Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9535047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:42:33.9535194Z warnings.warn(msg, FutureWarning) 2022-12-01T10:42:33.9535433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9535662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9535894Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9536109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9536855Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9537642Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9537886Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9538115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9538340Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9538571Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9538800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9539025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9539252Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9539458Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9540206Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9540973Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9541210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9541514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9541743Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9541973Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9542711Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9543436Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9543671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9543900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9544119Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9544330Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9545052Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9545772Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:42:33.9546056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9546296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:42:33.9546411Z dist init r=1, world=2 2022-12-01T10:42:33.9546519Z dist init r=0, world=2 2022-12-01T10:42:33.9546619Z ok (6.916s) 2022-12-01T10:42:33.9546643Z 2022-12-01T10:42:33.9546916Z ---------------------------------------------------------------------- 2022-12-01T10:42:33.9547032Z Ran 59 tests in 358.490s 2022-12-01T10:42:33.9547053Z 2022-12-01T10:42:33.9547143Z OK (skipped=5) 2022-12-01T10:42:33.9547163Z 2022-12-01T10:42:33.9547284Z Generating XML reports... 2022-12-01T10:42:33.9547694Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20221201103634.xml 2022-12-01T10:42:33.9548099Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20221201103634.xml 2022-12-01T10:42:33.9548513Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20221201103634.xml 2022-12-01T10:42:33.9548939Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20221201103634.xml 2022-12-01T10:42:33.9548959Z 2022-12-01T10:42:33.9549468Z ##[endgroup] 2022-12-01T10:42:33.9549929Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_core_nmavg695) 2022-12-01T10:42:33.9549949Z 2022-12-01T10:42:33.9550205Z Running distributed/fsdp/test_fsdp_state_dict ... [2022-12-01 10:42:33.653534] 2022-12-01T10:42:33.9550686Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_state_dict.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:42:33.653803] 2022-12-01T10:46:50.8022416Z 2022-12-01T10:46:50.8023089Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_state_dict 2022-12-01T10:46:50.8025696Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_state_dict (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_state_dict_4w1ggyy6) 2022-12-01T10:46:50.8028423Z 2022-12-01T10:46:50.8029181Z Running tests... 2022-12-01T10:46:50.8030060Z ---------------------------------------------------------------------- 2022-12-01T10:46:50.8030991Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_state_dict 2022-12-01T10:46:50.8032044Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8033095Z Tests that we can save a state_dict and load it into a blank model ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:46:50.8043790Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20119 2022-12-01T10:46:50.8044750Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20120 2022-12-01T10:46:50.8046075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8046751Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8047348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8047842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8048432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8048881Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8049894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8050534Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8051959Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8052603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8053260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8053977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8054901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8055851Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8057540Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8058634Z warnings.warn( 2022-12-01T10:46:50.8060123Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8061094Z warnings.warn( 2022-12-01T10:46:50.8061547Z dist init r=1, world=2 2022-12-01T10:46:50.8061997Z dist init r=0, world=2 2022-12-01T10:46:50.8062407Z ok (5.089s) 2022-12-01T10:46:50.8063335Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8064596Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20198 2022-12-01T10:46:50.8065810Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20199 2022-12-01T10:46:50.8067010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8067886Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8068937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8069794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8070907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8071695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8072742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8073639Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8074484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8075400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8076651Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8077972Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8078924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8079792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8080455Z dist init r=0, world=2 2022-12-01T10:46:50.8080893Z dist init r=1, world=2 2022-12-01T10:46:50.8081298Z ok (3.411s) 2022-12-01T10:46:50.8082263Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8084169Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20277 2022-12-01T10:46:50.8084921Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20278 2022-12-01T10:46:50.8085532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8085986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8086815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8087535Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8088704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8089502Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8090089Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8091138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8091580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8092079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8092750Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8093815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8094470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8094952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8095836Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8096388Z warnings.warn( 2022-12-01T10:46:50.8097115Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8097659Z warnings.warn( 2022-12-01T10:46:50.8097908Z dist init r=1, world=2 2022-12-01T10:46:50.8098144Z dist init r=0, world=2 2022-12-01T10:46:50.8098377Z ok (3.612s) 2022-12-01T10:46:50.8098857Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8099516Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20356 2022-12-01T10:46:50.8100027Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20357 2022-12-01T10:46:50.8100636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8101087Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8101645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8102119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8102697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8103234Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8103811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8104273Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8104724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8105220Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8105856Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8106539Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8107063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8107515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8107873Z dist init r=0, world=2 2022-12-01T10:46:50.8108124Z dist init r=1, world=2 2022-12-01T10:46:50.8108342Z ok (3.311s) 2022-12-01T10:46:50.8108826Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8109483Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20435 2022-12-01T10:46:50.8110004Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20436 2022-12-01T10:46:50.8110678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8111126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8111690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8112138Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8112689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8113150Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8113734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8114191Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8114625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8115125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8115790Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8116461Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8116983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8117451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8118318Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8118854Z warnings.warn( 2022-12-01T10:46:50.8119661Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8120219Z warnings.warn( 2022-12-01T10:46:50.8120533Z dist init r=1, world=2 2022-12-01T10:46:50.8120911Z dist init r=0, world=2 2022-12-01T10:46:50.8121247Z ok (3.511s) 2022-12-01T10:46:50.8121802Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8122860Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20514 2022-12-01T10:46:50.8123537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20515 2022-12-01T10:46:50.8140019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8140567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8141204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8141681Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8142250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8142696Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8143263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8143725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8144165Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8144833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8145504Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8146189Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8146696Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8147173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8147526Z dist init r=1, world=2 2022-12-01T10:46:50.8147758Z dist init r=0, world=2 2022-12-01T10:46:50.8147993Z ok (3.411s) 2022-12-01T10:46:50.8148479Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8149155Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20593 2022-12-01T10:46:50.8149666Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20594 2022-12-01T10:46:50.8150280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8150732Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8151292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8151764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8152342Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8152788Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8153336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8153884Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8154357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8154841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8155509Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8156191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8156712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8157167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8158044Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8158601Z warnings.warn( 2022-12-01T10:46:50.8159352Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8159877Z warnings.warn( 2022-12-01T10:46:50.8160126Z dist init r=1, world=2 2022-12-01T10:46:50.8160376Z dist init r=0, world=2 2022-12-01T10:46:50.8160596Z ok (3.511s) 2022-12-01T10:46:50.8161079Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8161800Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20672 2022-12-01T10:46:50.8162333Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20673 2022-12-01T10:46:50.8163577Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8164036Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8164614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8165084Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8165646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8166097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8166669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8167118Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8167577Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8168083Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8168740Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8169411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8169932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8170403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8170760Z dist init r=1, world=2 2022-12-01T10:46:50.8170993Z dist init r=0, world=2 2022-12-01T10:46:50.8171229Z ok (3.511s) 2022-12-01T10:46:50.8171824Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8172486Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20751 2022-12-01T10:46:50.8173013Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20752 2022-12-01T10:46:50.8173625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8174075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8174633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8175108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8175687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8176108Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8176676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8177140Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8177592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8178072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8178728Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8179510Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8180036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8180488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8181347Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8181898Z warnings.warn( 2022-12-01T10:46:50.8182650Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8183176Z warnings.warn( 2022-12-01T10:46:50.8184067Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8185318Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8186539Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8187832Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8188455Z dist init r=0, world=2 2022-12-01T10:46:50.8188686Z dist init r=1, world=2 2022-12-01T10:46:50.8188922Z ok (3.511s) 2022-12-01T10:46:50.8189416Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8190065Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20830 2022-12-01T10:46:50.8190588Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20831 2022-12-01T10:46:50.8191201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8191656Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8192215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8192678Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8193254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8193677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8194243Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8194703Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8195225Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8195711Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8196370Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8197056Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8197578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8198031Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8198376Z dist init r=0, world=2 2022-12-01T10:46:50.8198626Z dist init r=1, world=2 2022-12-01T10:46:50.8198849Z ok (3.511s) 2022-12-01T10:46:50.8199333Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8200005Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20909 2022-12-01T10:46:50.8200528Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20910 2022-12-01T10:46:50.8201123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8201572Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8202142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8203162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8203775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8204225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8204891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8205359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8205816Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8206315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8206977Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8207648Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8208168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8208638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8209509Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8210048Z warnings.warn( 2022-12-01T10:46:50.8210806Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8211347Z warnings.warn( 2022-12-01T10:46:50.8212233Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8213570Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8214184Z dist init r=1, world=2 2022-12-01T10:46:50.8214435Z dist init r=0, world=2 2022-12-01T10:46:50.8214673Z ok (3.511s) 2022-12-01T10:46:50.8215143Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8215806Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20988 2022-12-01T10:46:50.8216334Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20989 2022-12-01T10:46:50.8216945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8217380Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8217948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8218409Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8218967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8219409Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8219977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8220436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8220872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8221440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8222155Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8222849Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8223352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8223821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8224174Z dist init r=1, world=2 2022-12-01T10:46:50.8224404Z dist init r=0, world=2 2022-12-01T10:46:50.8224641Z ok (3.412s) 2022-12-01T10:46:50.8225130Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8225797Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21067 2022-12-01T10:46:50.8226306Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21068 2022-12-01T10:46:50.8226915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8227365Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8227920Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8228388Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8228959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8229472Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8230026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8230493Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8230947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8231433Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8232093Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8232774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8233294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8233751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8234617Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8235174Z warnings.warn( 2022-12-01T10:46:50.8235927Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8236452Z warnings.warn( 2022-12-01T10:46:50.8237337Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8238637Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8239883Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8241117Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8241731Z dist init r=0, world=2 2022-12-01T10:46:50.8241969Z dist init r=1, world=2 2022-12-01T10:46:50.8242207Z ok (3.511s) 2022-12-01T10:46:50.8243045Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8243702Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21146 2022-12-01T10:46:50.8244225Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21147 2022-12-01T10:46:50.8244840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8245296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8245955Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8246428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8247006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8247452Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8247998Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8248462Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8248916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8249403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8250069Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8250760Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8251284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8251734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8252083Z dist init r=1, world=2 2022-12-01T10:46:50.8252332Z dist init r=0, world=2 2022-12-01T10:46:50.8252551Z ok (3.411s) 2022-12-01T10:46:50.8253039Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8253703Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21225 2022-12-01T10:46:50.8254232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21226 2022-12-01T10:46:50.8254901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8255366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8255946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8256416Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8256976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8257417Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8257982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8258431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8258889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8259391Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8260048Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8260719Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8261240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8261712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8262582Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8263185Z warnings.warn( 2022-12-01T10:46:50.8263946Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8264488Z warnings.warn( 2022-12-01T10:46:50.8265375Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8266595Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8267204Z dist init r=1, world=2 2022-12-01T10:46:50.8267460Z dist init r=0, world=2 2022-12-01T10:46:50.8267699Z ok (3.611s) 2022-12-01T10:46:50.8268166Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8268826Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21304 2022-12-01T10:46:50.8269352Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21305 2022-12-01T10:46:50.8269965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8270395Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8270967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8271490Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8272084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8272507Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8273072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8273536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8273971Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8274473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8275131Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8275821Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8276327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8276793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8277147Z dist init r=0, world=2 2022-12-01T10:46:50.8277380Z dist init r=1, world=2 2022-12-01T10:46:50.8277618Z ok (3.411s) 2022-12-01T10:46:50.8278099Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8278752Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21383 2022-12-01T10:46:50.8279330Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21384 2022-12-01T10:46:50.8279944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8280402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8280977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8281424Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8282001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8282795Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8283358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8283828Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8284283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8284784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8285421Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8286102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8286619Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8287085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8287935Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8288491Z warnings.warn( 2022-12-01T10:46:50.8289331Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8289889Z warnings.warn( 2022-12-01T10:46:50.8290119Z dist init r=0, world=2 2022-12-01T10:46:50.8290369Z dist init r=1, world=2 2022-12-01T10:46:50.8290606Z ok (3.511s) 2022-12-01T10:46:50.8291063Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8291714Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21462 2022-12-01T10:46:50.8292244Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21463 2022-12-01T10:46:50.8292844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8293291Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8293861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8294328Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8294885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8295328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8295890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8296436Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8296875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8297375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8298030Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8298700Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8299221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8299686Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8300549Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8301082Z warnings.warn( 2022-12-01T10:46:50.8301832Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8302376Z warnings.warn( 2022-12-01T10:46:50.8303260Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8304501Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8305779Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8306401Z dist init r=0, world=2 2022-12-01T10:46:50.8306654Z dist init r=1, world=2 2022-12-01T10:46:50.8306896Z ok (3.512s) 2022-12-01T10:46:50.8307356Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8308016Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21541 2022-12-01T10:46:50.8308547Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21542 2022-12-01T10:46:50.8309165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8309598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8310172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8310643Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8311206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8311648Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8312217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8312746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8313188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8313693Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8314352Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8315037Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8315540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8316011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8316876Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8317425Z warnings.warn( 2022-12-01T10:46:50.8318167Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8318712Z warnings.warn( 2022-12-01T10:46:50.8319595Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8320824Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8322167Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8323621Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8324228Z dist init r=0, world=2 2022-12-01T10:46:50.8324478Z dist init r=1, world=2 2022-12-01T10:46:50.8324717Z ok (3.511s) 2022-12-01T10:46:50.8325179Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8325836Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21620 2022-12-01T10:46:50.8326361Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21621 2022-12-01T10:46:50.8326971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8327401Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8327970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8328434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8328992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8329535Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8330113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8330573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8331007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8331506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8332159Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8332843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8333355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8333825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8334694Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8335247Z warnings.warn( 2022-12-01T10:46:50.8335986Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8336529Z warnings.warn( 2022-12-01T10:46:50.8337412Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8338718Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8339338Z dist init r=0, world=2 2022-12-01T10:46:50.8339591Z dist init r=1, world=2 2022-12-01T10:46:50.8339832Z ok (3.511s) 2022-12-01T10:46:50.8340319Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8340954Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21699 2022-12-01T10:46:50.8341487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21700 2022-12-01T10:46:50.8342156Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8342594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8343359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8343888Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8344470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8344897Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8345464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8346006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8346466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8346953Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8347616Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8348303Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8348807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8349274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8350138Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8350691Z warnings.warn( 2022-12-01T10:46:50.8351433Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8351977Z warnings.warn( 2022-12-01T10:46:50.8352231Z dist init r=1, world=2 2022-12-01T10:46:50.8352482Z dist init r=0, world=2 2022-12-01T10:46:50.8352704Z ok (3.511s) 2022-12-01T10:46:50.8353176Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8353828Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21778 2022-12-01T10:46:50.8354339Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21779 2022-12-01T10:46:50.8355018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8355476Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8356050Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8356500Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8357070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8357508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8358070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8358515Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8358972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8359472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8360111Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8360802Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8361325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8361794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8363043Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8363705Z warnings.warn( 2022-12-01T10:46:50.8364481Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8365020Z warnings.warn( 2022-12-01T10:46:50.8365890Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8367140Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8368380Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8368994Z dist init r=0, world=2 2022-12-01T10:46:50.8369245Z dist init r=1, world=2 2022-12-01T10:46:50.8369468Z ok (3.511s) 2022-12-01T10:46:50.8369947Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8370602Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21857 2022-12-01T10:46:50.8371131Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21858 2022-12-01T10:46:50.8371723Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8372260Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8372917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8373528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8374111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8374549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8375111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8375556Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8376013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8376517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8377171Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8377840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8378359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8378829Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8379677Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8380378Z warnings.warn( 2022-12-01T10:46:50.8381143Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8381684Z warnings.warn( 2022-12-01T10:46:50.8382572Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8383792Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8384400Z dist init r=0, world=2 2022-12-01T10:46:50.8384652Z dist init r=1, world=2 2022-12-01T10:46:50.8384895Z ok (3.511s) 2022-12-01T10:46:50.8385353Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8386004Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21936 2022-12-01T10:46:50.8386529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21937 2022-12-01T10:46:50.8387122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8387568Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8388139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8388609Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8389228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8389692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8390265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8390728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8391161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8391659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8392313Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8392986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8393512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8393982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8394848Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8395376Z warnings.warn( 2022-12-01T10:46:50.8396136Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8396750Z warnings.warn( 2022-12-01T10:46:50.8397640Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8398873Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8399472Z dist init r=0, world=2 2022-12-01T10:46:50.8399722Z dist init r=1, world=2 2022-12-01T10:46:50.8399958Z ok (3.611s) 2022-12-01T10:46:50.8400411Z test_fsdp_state_dict_keys_state_dict_type_local_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22015 2022-12-01T10:46:50.8400979Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22016 2022-12-01T10:46:50.8401587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8402039Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8402778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8403255Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8403834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8404275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8404818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8405284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8405821Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8406324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8406989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8407677Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8408202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8408652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8409873Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8410633Z warnings.warn( 2022-12-01T10:46:50.8411744Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8412492Z warnings.warn( 2022-12-01T10:46:50.8412723Z dist init r=0, world=2 2022-12-01T10:46:50.8413057Z dist init r=1, world=2 2022-12-01T10:46:50.8413295Z ok (3.410s) 2022-12-01T10:46:50.8413758Z test_fsdp_state_dict_keys_state_dict_type_sharded_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22094 2022-12-01T10:46:50.8414321Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22095 2022-12-01T10:46:50.8414928Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8415376Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8415924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8416393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8416969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8417418Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8417968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8418431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8418885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8419368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8420022Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8420706Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8421227Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8421686Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8422999Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8423775Z warnings.warn( 2022-12-01T10:46:50.8424878Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8425631Z warnings.warn( 2022-12-01T10:46:50.8426373Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8426915Z warnings.warn( 2022-12-01T10:46:50.8427659Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8428199Z warnings.warn( 2022-12-01T10:46:50.8428435Z dist init r=0, world=2 2022-12-01T10:46:50.8428688Z dist init r=1, world=2 2022-12-01T10:46:50.8428926Z ok (3.510s) 2022-12-01T10:46:50.8429371Z test_fsdp_state_dict_keys_state_dict_type_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22173 2022-12-01T10:46:50.8429994Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22174 2022-12-01T10:46:50.8430607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8431051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8431605Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8432070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8432643Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8433068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8433636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8434099Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8434554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8435037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8435684Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8436374Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8436894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8437345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8438614Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8439392Z warnings.warn( 2022-12-01T10:46:50.8440489Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8441238Z warnings.warn( 2022-12-01T10:46:50.8441975Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8442756Z warnings.warn( 2022-12-01T10:46:50.8443518Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8444059Z warnings.warn( 2022-12-01T10:46:50.8444289Z dist init r=1, world=2 2022-12-01T10:46:50.8444538Z dist init r=0, world=2 2022-12-01T10:46:50.8444777Z ok (3.410s) 2022-12-01T10:46:50.8445176Z test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8445927Z Tests saving the state dict, zeroing a target model's parameters, and ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22252 2022-12-01T10:46:50.8446560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22253 2022-12-01T10:46:50.8447170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8447602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8448171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8448638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8449198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8449638Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8450202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8450661Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8451103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8451606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8452264Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8452948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8453447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8453919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8454782Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8455314Z warnings.warn( 2022-12-01T10:46:50.8456138Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8456698Z warnings.warn( 2022-12-01T10:46:50.8456946Z dist init r=0, world=2 2022-12-01T10:46:50.8457177Z dist init r=1, world=2 2022-12-01T10:46:50.8457412Z ok (3.410s) 2022-12-01T10:46:50.8457832Z test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_first (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8458569Z Tests saving the state dict, zeroing a target model's parameters, and ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22331 2022-12-01T10:46:50.8459110Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22332 2022-12-01T10:46:50.8459716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8460166Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8460719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8461185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8461753Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8462194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8462741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8463199Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8463717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8464206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8464868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8465553Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8466076Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8466527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8467383Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8467937Z warnings.warn( 2022-12-01T10:46:50.8468691Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8469216Z warnings.warn( 2022-12-01T10:46:50.8469461Z dist init r=0, world=2 2022-12-01T10:46:50.8469711Z dist init r=1, world=2 2022-12-01T10:46:50.8469928Z ok (3.510s) 2022-12-01T10:46:50.8470347Z test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_second (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8471099Z Tests saving the state dict, zeroing a target model's parameters, and ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22410 2022-12-01T10:46:50.8471638Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22411 2022-12-01T10:46:50.8472228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8472677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8473308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8473790Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8474353Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8474794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8475360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8475807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8476256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8476762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8477413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8478085Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8478603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8479069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8479934Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8480531Z warnings.warn( 2022-12-01T10:46:50.8481294Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8481837Z warnings.warn( 2022-12-01T10:46:50.8482066Z dist init r=1, world=2 2022-12-01T10:46:50.8482320Z dist init r=0, world=2 2022-12-01T10:46:50.8482727Z ok (3.411s) 2022-12-01T10:46:50.8483137Z test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8483862Z Tests saving the state dict, zeroing a target model's parameters, and ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22489 2022-12-01T10:46:50.8484410Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22490 2022-12-01T10:46:50.8485015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8485449Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8486026Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8486496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8487070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8487493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8488062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8488521Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8488979Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8489465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8490210Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8490925Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8491431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8491898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8492756Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8493299Z warnings.warn( 2022-12-01T10:46:50.8494037Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8494585Z warnings.warn( 2022-12-01T10:46:50.8494832Z dist init r=1, world=2 2022-12-01T10:46:50.8495081Z dist init r=0, world=2 2022-12-01T10:46:50.8495299Z ok (3.510s) 2022-12-01T10:46:50.8495708Z test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_first (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8496449Z Tests saving the state dict, zeroing a target model's parameters, and ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22568 2022-12-01T10:46:50.8496967Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22569 2022-12-01T10:46:50.8497570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8498103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8498685Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8499137Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8499714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8500158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8500711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8501173Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8501623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8502122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8502772Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8503459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8503981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8504450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8505297Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8505839Z warnings.warn( 2022-12-01T10:46:50.8506594Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8507132Z warnings.warn( 2022-12-01T10:46:50.8507414Z dist init r=0, world=2 2022-12-01T10:46:50.8507677Z dist init r=1, world=2 2022-12-01T10:46:50.8507914Z ok (3.510s) 2022-12-01T10:46:50.8508309Z test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_second (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8509054Z Tests saving the state dict, zeroing a target model's parameters, and ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22647 2022-12-01T10:46:50.8509592Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22648 2022-12-01T10:46:50.8510192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8510625Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8511201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8511667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8512228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8512668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8513230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8513690Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8514120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8514618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8515339Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8516031Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8516531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8516998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8517866Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8518412Z warnings.warn( 2022-12-01T10:46:50.8519147Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8519690Z warnings.warn( 2022-12-01T10:46:50.8519935Z dist init r=0, world=2 2022-12-01T10:46:50.8520170Z dist init r=1, world=2 2022-12-01T10:46:50.8520406Z ok (3.510s) 2022-12-01T10:46:50.8520856Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8521481Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22726 2022-12-01T10:46:50.8522020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22727 2022-12-01T10:46:50.8522678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8523355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8523927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8524473Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8525070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8525512Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8526066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8526526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8526974Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8527454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8528107Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8528802Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8529325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8529780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8530637Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8531184Z warnings.warn( 2022-12-01T10:46:50.8531933Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8532545Z warnings.warn( 2022-12-01T10:46:50.8533326Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8533882Z warnings.warn( 2022-12-01T10:46:50.8534641Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8535163Z warnings.warn( 2022-12-01T10:46:50.8535408Z dist init r=1, world=2 2022-12-01T10:46:50.8535655Z dist init r=0, world=2 2022-12-01T10:46:50.8535873Z ok (3.912s) 2022-12-01T10:46:50.8536321Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8536969Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22809 2022-12-01T10:46:50.8537507Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22810 2022-12-01T10:46:50.8538101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8538545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8539113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8539581Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8540140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8540583Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8541148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8541647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8542117Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8542616Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8543272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8543942Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8544458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8544929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8545283Z dist init r=0, world=2 2022-12-01T10:46:50.8545515Z dist init r=1, world=2 2022-12-01T10:46:50.8545753Z ok (3.411s) 2022-12-01T10:46:50.8546210Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8546839Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22888 2022-12-01T10:46:50.8547376Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22889 2022-12-01T10:46:50.8547983Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8548430Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8548985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8549517Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8550096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8550522Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8551092Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8551555Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8552004Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8552487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8553139Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8553830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8554354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8554810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8555664Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8556215Z warnings.warn( 2022-12-01T10:46:50.8556970Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8557499Z warnings.warn( 2022-12-01T10:46:50.8558316Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8558881Z warnings.warn( 2022-12-01T10:46:50.8559642Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8560161Z warnings.warn( 2022-12-01T10:46:50.8560406Z dist init r=1, world=2 2022-12-01T10:46:50.8560657Z dist init r=0, world=2 2022-12-01T10:46:50.8560874Z ok (4.113s) 2022-12-01T10:46:50.8561316Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8561960Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22971 2022-12-01T10:46:50.8562675Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22972 2022-12-01T10:46:50.8563276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8563721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8564293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8564744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8565316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8565755Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8566425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8566875Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8567328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8567827Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8568467Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8569149Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8569671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8570140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8570482Z dist init r=0, world=2 2022-12-01T10:46:50.8570733Z dist init r=1, world=2 2022-12-01T10:46:50.8570975Z ok (3.411s) 2022-12-01T10:46:50.8571416Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8572066Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23050 2022-12-01T10:46:50.8572599Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23051 2022-12-01T10:46:50.8573205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8573635Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8574201Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8574667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8575313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8575758Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8576328Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8576786Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8577220Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8577721Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8578374Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8579061Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8579566Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8580035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8580901Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8581450Z warnings.warn( 2022-12-01T10:46:50.8582187Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8582726Z warnings.warn( 2022-12-01T10:46:50.8583568Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8584119Z warnings.warn( 2022-12-01T10:46:50.8584869Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8585409Z warnings.warn( 2022-12-01T10:46:50.8585654Z dist init r=0, world=2 2022-12-01T10:46:50.8585886Z dist init r=1, world=2 2022-12-01T10:46:50.8586122Z ok (3.912s) 2022-12-01T10:46:50.8586572Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8587216Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23133 2022-12-01T10:46:50.8587745Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23134 2022-12-01T10:46:50.8588354Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8588803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8589373Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8589824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8590400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8590840Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8591392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8591855Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8592361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8592871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8593515Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8594201Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8594724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8595192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8595534Z dist init r=1, world=2 2022-12-01T10:46:50.8595783Z dist init r=0, world=2 2022-12-01T10:46:50.8596022Z ok (3.411s) 2022-12-01T10:46:50.8596462Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8597107Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23212 2022-12-01T10:46:50.8597648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23213 2022-12-01T10:46:50.8598257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8598686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8599255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8599784Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8600343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8600789Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8601352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8601808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8602240Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8602970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8603633Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8604318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8604827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8605299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8606170Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8606716Z warnings.warn( 2022-12-01T10:46:50.8607450Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8607990Z warnings.warn( 2022-12-01T10:46:50.8608752Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8609397Z warnings.warn( 2022-12-01T10:46:50.8610168Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8610720Z warnings.warn( 2022-12-01T10:46:50.8610967Z dist init r=1, world=2 2022-12-01T10:46:50.8611199Z dist init r=0, world=2 2022-12-01T10:46:50.8611433Z ok (4.014s) 2022-12-01T10:46:50.8611884Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8612507Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23295 2022-12-01T10:46:50.8613048Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23296 2022-12-01T10:46:50.8613657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8614105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8614658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8615127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8615701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8616144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8616687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8617236Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8617692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8618173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8618830Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8619514Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8620032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8620481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8620826Z dist init r=0, world=2 2022-12-01T10:46:50.8621082Z dist init r=1, world=2 2022-12-01T10:46:50.8621301Z ok (3.411s) 2022-12-01T10:46:50.8621750Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8622426Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23374 2022-12-01T10:46:50.8622966Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23375 2022-12-01T10:46:50.8623556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8624002Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8624568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8625035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8625594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8626091Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8626671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8627113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8627567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8628065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8628717Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8629386Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8629906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8630376Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8631242Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8631766Z warnings.warn( 2022-12-01T10:46:50.8632520Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8633058Z warnings.warn( 2022-12-01T10:46:50.8633822Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8634417Z warnings.warn( 2022-12-01T10:46:50.8635193Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8635737Z warnings.warn( 2022-12-01T10:46:50.8635984Z dist init r=0, world=2 2022-12-01T10:46:50.8636217Z dist init r=1, world=2 2022-12-01T10:46:50.8636454Z ok (4.113s) 2022-12-01T10:46:50.8636893Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8637511Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23457 2022-12-01T10:46:50.8638052Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23458 2022-12-01T10:46:50.8638662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8639114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8639671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8639863Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8640230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8640402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8640774Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8640963Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8641211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8641495Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8641918Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8642366Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8642796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8643032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8643658Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8643777Z warnings.warn( 2022-12-01T10:46:50.8644401Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8644514Z warnings.warn( 2022-12-01T10:46:50.8645145Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8645238Z warnings.warn( 2022-12-01T10:46:50.8645862Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8646061Z warnings.warn( 2022-12-01T10:46:50.8646175Z dist init r=0, world=2 2022-12-01T10:46:50.8646284Z dist init r=1, world=2 2022-12-01T10:46:50.8646387Z ok (4.013s) 2022-12-01T10:46:50.8646702Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8647020Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23540 2022-12-01T10:46:50.8647222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23541 2022-12-01T10:46:50.8647600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8647773Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8648148Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8648340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8648705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8648878Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8649249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8649438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8649665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8649909Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8650306Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8650703Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8650999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8651239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8651864Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8651975Z warnings.warn( 2022-12-01T10:46:50.8652593Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8652691Z warnings.warn( 2022-12-01T10:46:50.8653325Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8653435Z warnings.warn( 2022-12-01T10:46:50.8654059Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8654168Z warnings.warn( 2022-12-01T10:46:50.8654279Z dist init r=1, world=2 2022-12-01T10:46:50.8654387Z dist init r=0, world=2 2022-12-01T10:46:50.8654485Z ok (4.012s) 2022-12-01T10:46:50.8654800Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8655163Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23623 2022-12-01T10:46:50.8655383Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23624 2022-12-01T10:46:50.8655760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8655934Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8656309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8656498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8656863Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8657034Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8657390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8657580Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8657828Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8658072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8658473Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8658868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8659096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8659324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8659999Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8660124Z warnings.warn( 2022-12-01T10:46:50.8660725Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8660834Z warnings.warn( 2022-12-01T10:46:50.8661464Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8661573Z warnings.warn( 2022-12-01T10:46:50.8662200Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8662317Z warnings.warn( 2022-12-01T10:46:50.8662430Z dist init r=0, world=2 2022-12-01T10:46:50.8662539Z dist init r=1, world=2 2022-12-01T10:46:50.8662621Z ok (4.012s) 2022-12-01T10:46:50.8662933Z test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8663365Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23706 2022-12-01T10:46:50.8663584Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23707 2022-12-01T10:46:50.8663949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8664182Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8664569Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8664758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8665120Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8665275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8665647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8665836Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8666080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8666324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8666732Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8667130Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8667359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8667584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8668188Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8668300Z warnings.warn( 2022-12-01T10:46:50.8668906Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8669019Z warnings.warn( 2022-12-01T10:46:50.8669700Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8669822Z warnings.warn( 2022-12-01T10:46:50.8670455Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8670564Z warnings.warn( 2022-12-01T10:46:50.8670675Z dist init r=0, world=2 2022-12-01T10:46:50.8670766Z dist init r=1, world=2 2022-12-01T10:46:50.8670866Z ok (4.012s) 2022-12-01T10:46:50.8671182Z test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False_fsdp_root_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8671615Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23789 2022-12-01T10:46:50.8671832Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23790 2022-12-01T10:46:50.8672197Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8672370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8672744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8672935Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8673279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8673513Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8673893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8674081Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8674324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8674567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8674967Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8675361Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8675588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8675801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8676782Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8676900Z warnings.warn( 2022-12-01T10:46:50.8677868Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8677982Z warnings.warn( 2022-12-01T10:46:50.8678660Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8678779Z warnings.warn( 2022-12-01T10:46:50.8679397Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8679503Z warnings.warn( 2022-12-01T10:46:50.8680129Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8680241Z warnings.warn( 2022-12-01T10:46:50.8680866Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8680957Z warnings.warn( 2022-12-01T10:46:50.8681066Z dist init r=1, world=2 2022-12-01T10:46:50.8681174Z dist init r=0, world=2 2022-12-01T10:46:50.8681275Z ok (4.112s) 2022-12-01T10:46:50.8681585Z test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8682014Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23872 2022-12-01T10:46:50.8682233Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23873 2022-12-01T10:46:50.8682893Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8683058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8683440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8683629Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8683996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8684168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8684540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8684726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8684976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8685201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8685608Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8686008Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8686238Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8686464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8686576Z dist init r=1, world=2 2022-12-01T10:46:50.8686683Z dist init r=0, world=2 2022-12-01T10:46:50.8686785Z ok (3.411s) 2022-12-01T10:46:50.8687078Z test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True_fsdp_root_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8687603Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23951 2022-12-01T10:46:50.8687838Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23952 2022-12-01T10:46:50.8688213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8688390Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8688767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8688956Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8689317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8689492Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8689845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8690035Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8690283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8690526Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8690921Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8691312Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8691541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8691845Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8691956Z dist init r=1, world=2 2022-12-01T10:46:50.8692048Z dist init r=0, world=2 2022-12-01T10:46:50.8692148Z ok (3.511s) 2022-12-01T10:46:50.8692455Z test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8692887Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24030 2022-12-01T10:46:50.8693102Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24031 2022-12-01T10:46:50.8693471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8693645Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8694022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8694198Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8694564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8694740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8695114Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8695301Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8695545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8695785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8696183Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8696577Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8696836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8697072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8697698Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8697811Z warnings.warn( 2022-12-01T10:46:50.8698431Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8698545Z warnings.warn( 2022-12-01T10:46:50.8699177Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8699288Z warnings.warn( 2022-12-01T10:46:50.8699916Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8700024Z warnings.warn( 2022-12-01T10:46:50.8700118Z dist init r=1, world=2 2022-12-01T10:46:50.8700226Z dist init r=0, world=2 2022-12-01T10:46:50.8700326Z ok (3.912s) 2022-12-01T10:46:50.8700624Z test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8701112Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24113 2022-12-01T10:46:50.8701334Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24114 2022-12-01T10:46:50.8701705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8701879Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8702230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8702402Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8702779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8702971Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8703348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8703538Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8703786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8704033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8704437Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8704816Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8705046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8705272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8706305Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8706426Z warnings.warn( 2022-12-01T10:46:50.8707400Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8707510Z warnings.warn( 2022-12-01T10:46:50.8708134Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8708242Z warnings.warn( 2022-12-01T10:46:50.8708855Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8708966Z warnings.warn( 2022-12-01T10:46:50.8709580Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8709688Z warnings.warn( 2022-12-01T10:46:50.8710312Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8710479Z warnings.warn( 2022-12-01T10:46:50.8710594Z dist init r=0, world=2 2022-12-01T10:46:50.8710702Z dist init r=1, world=2 2022-12-01T10:46:50.8710802Z ok (4.012s) 2022-12-01T10:46:50.8711102Z test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8711534Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24196 2022-12-01T10:46:50.8711737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24197 2022-12-01T10:46:50.8712105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8712277Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8712658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8712850Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8713219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8713387Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8713759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8713930Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8714179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8714421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8714825Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8715272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8715511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8715737Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8716362Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8716474Z warnings.warn( 2022-12-01T10:46:50.8717087Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8717184Z warnings.warn( 2022-12-01T10:46:50.8717814Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8717923Z warnings.warn( 2022-12-01T10:46:50.8718550Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8718659Z warnings.warn( 2022-12-01T10:46:50.8718770Z dist init r=0, world=2 2022-12-01T10:46:50.8718878Z dist init r=1, world=2 2022-12-01T10:46:50.8718977Z ok (4.014s) 2022-12-01T10:46:50.8719256Z test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True_fsdp_root_True (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8719758Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24279 2022-12-01T10:46:50.8719976Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24280 2022-12-01T10:46:50.8720345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8720519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8720896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8721086Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8721447Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8721623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8721987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8722176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8722463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8722711Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8723340Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8723737Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8723966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8724196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8725254Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8725384Z warnings.warn( 2022-12-01T10:46:50.8726363Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8726460Z warnings.warn( 2022-12-01T10:46:50.8727080Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8727190Z warnings.warn( 2022-12-01T10:46:50.8727804Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8727913Z warnings.warn( 2022-12-01T10:46:50.8728540Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8728722Z warnings.warn( 2022-12-01T10:46:50.8729361Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8729471Z warnings.warn( 2022-12-01T10:46:50.8729582Z dist init r=0, world=2 2022-12-01T10:46:50.8729676Z dist init r=1, world=2 2022-12-01T10:46:50.8729775Z ok (4.012s) 2022-12-01T10:46:50.8729979Z test_state_dict_rank0_offload_save_load_flow (__main__.TestFSDPStateDict) 2022-12-01T10:46:50.8730286Z Tests saving a model checkpoint only on rank 0 and loading it only ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24362 2022-12-01T10:46:50.8730505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24363 2022-12-01T10:46:50.8730879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8731057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8731428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8731587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8731966Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8732155Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8732527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8732715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8732960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8733204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8733608Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8734040Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8734283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8734508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8735132Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8735244Z warnings.warn( 2022-12-01T10:46:50.8735862Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8735973Z warnings.warn( 2022-12-01T10:46:50.8736089Z dist init r=1, world=2 2022-12-01T10:46:50.8736197Z dist init r=0, world=2 2022-12-01T10:46:50.8736280Z ok (3.711s) 2022-12-01T10:46:50.8736626Z test_state_dict_save_load_flow_state_dict_type_local_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24441 2022-12-01T10:46:50.8736844Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24442 2022-12-01T10:46:50.8737214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8737389Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8737765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8738012Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8738382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8738552Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8738906Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8739094Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8739339Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8739585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8739981Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8740378Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8740607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8740836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8741813Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8741930Z warnings.warn( 2022-12-01T10:46:50.8742933Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8743057Z warnings.warn( 2022-12-01T10:46:50.8743682Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8743789Z warnings.warn( 2022-12-01T10:46:50.8744405Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8744518Z warnings.warn( 2022-12-01T10:46:50.8745149Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8745259Z warnings.warn( 2022-12-01T10:46:50.8745886Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8745994Z warnings.warn( 2022-12-01T10:46:50.8746215Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8746452Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8747209Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8748019Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8748258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8748488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8748600Z dist init r=1, world=2 2022-12-01T10:46:50.8748711Z dist init r=0, world=2 2022-12-01T10:46:50.8748810Z ok (4.011s) 2022-12-01T10:46:50.8749159Z test_state_dict_save_load_flow_state_dict_type_sharded_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24524 2022-12-01T10:46:50.8749365Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24525 2022-12-01T10:46:50.8749740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8749916Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8750290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8750479Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8750843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8751015Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8751389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8751578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8751867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8752124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8752527Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8752919Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8753146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8753373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8754347Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8754465Z warnings.warn( 2022-12-01T10:46:50.8755438Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8755550Z warnings.warn( 2022-12-01T10:46:50.8756169Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8756323Z warnings.warn( 2022-12-01T10:46:50.8756947Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8757058Z warnings.warn( 2022-12-01T10:46:50.8757685Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8757796Z warnings.warn( 2022-12-01T10:46:50.8758425Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8758537Z warnings.warn( 2022-12-01T10:46:50.8758776Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8759014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8759770Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8760503Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8760725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8761007Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8761129Z dist init r=0, world=2 2022-12-01T10:46:50.8761239Z dist init r=1, world=2 2022-12-01T10:46:50.8761337Z ok (4.010s) 2022-12-01T10:46:50.8761680Z test_state_dict_save_load_flow_state_dict_type_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24607 2022-12-01T10:46:50.8761898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24608 2022-12-01T10:46:50.8762270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8762647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8763051Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8763240Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8763603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8763776Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8764147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8764334Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8764580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8764824Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8765210Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8765706Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8765936Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8766164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8767142Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8767258Z warnings.warn( 2022-12-01T10:46:50.8768232Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8768343Z warnings.warn( 2022-12-01T10:46:50.8768961Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8769068Z warnings.warn( 2022-12-01T10:46:50.8769682Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8769794Z warnings.warn( 2022-12-01T10:46:50.8770470Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8770593Z warnings.warn( 2022-12-01T10:46:50.8771228Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8771338Z warnings.warn( 2022-12-01T10:46:50.8771575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8771811Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8772566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8773306Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:46:50.8773543Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8773774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:46:50.8773869Z dist init r=1, world=2 2022-12-01T10:46:50.8773977Z dist init r=0, world=2 2022-12-01T10:46:50.8774075Z ok (4.011s) 2022-12-01T10:46:50.8774498Z test_state_dict_skip_module_state_dict_type_local_state_dict_double_nest_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24690 2022-12-01T10:46:50.8774721Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24691 2022-12-01T10:46:50.8775098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8775273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8775651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8775824Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8776192Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8776367Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8776744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8776934Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8777182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8777423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8777824Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8778220Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8778430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8778657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8779332Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8779454Z warnings.warn( 2022-12-01T10:46:50.8780080Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8780190Z warnings.warn( 2022-12-01T10:46:50.8780816Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8780926Z warnings.warn( 2022-12-01T10:46:50.8781549Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8781664Z warnings.warn( 2022-12-01T10:46:50.8781759Z dist init r=1, world=2 2022-12-01T10:46:50.8781867Z dist init r=0, world=2 2022-12-01T10:46:50.8781967Z ok (4.012s) 2022-12-01T10:46:50.8782337Z test_state_dict_skip_module_state_dict_type_sharded_state_dict_double_nest_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24773 2022-12-01T10:46:50.8782557Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24774 2022-12-01T10:46:50.8782927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8783103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8783544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8783716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8784077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8784249Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8784621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8784807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8785052Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8785296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8785695Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8786097Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8786312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8786540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8787166Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8787282Z warnings.warn( 2022-12-01T10:46:50.8787899Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8788011Z warnings.warn( 2022-12-01T10:46:50.8788694Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8788812Z warnings.warn( 2022-12-01T10:46:50.8789442Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8789551Z warnings.warn( 2022-12-01T10:46:50.8789644Z dist init r=0, world=2 2022-12-01T10:46:50.8789752Z dist init r=1, world=2 2022-12-01T10:46:50.8789850Z ok (4.012s) 2022-12-01T10:46:50.8790212Z test_state_dict_skip_module_state_dict_type_state_dict_double_nest_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24856 2022-12-01T10:46:50.8790434Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24857 2022-12-01T10:46:50.8790808Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8790987Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8791365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8791539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8791902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8792073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8792443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8792687Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8792937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8793180Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8793582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8793961Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8794191Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8794415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8795034Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8795150Z warnings.warn( 2022-12-01T10:46:50.8795771Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8795883Z warnings.warn( 2022-12-01T10:46:50.8796511Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8796620Z warnings.warn( 2022-12-01T10:46:50.8797242Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:46:50.8797337Z warnings.warn( 2022-12-01T10:46:50.8797448Z dist init r=0, world=2 2022-12-01T10:46:50.8797609Z dist init r=1, world=2 2022-12-01T10:46:50.8797719Z ok (3.912s) 2022-12-01T10:46:50.8798017Z test_state_dict_type (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24939 2022-12-01T10:46:50.8798235Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24940 2022-12-01T10:46:50.8798609Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8798784Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8799143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8799333Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8799698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8799873Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8800242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8800428Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8800673Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8800917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8801315Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8801691Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8801981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8802213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8802325Z dist init r=0, world=2 2022-12-01T10:46:50.8802640Z dist init r=1, world=2 2022-12-01T10:46:50.8802748Z ok (3.409s) 2022-12-01T10:46:50.8803154Z test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_False (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25018 2022-12-01T10:46:50.8803372Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25019 2022-12-01T10:46:50.8803728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8803903Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8804285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8804476Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8804837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8805008Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8805378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8805562Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8805795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8806031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8806438Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8806908Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8807150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8807375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8808363Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8808480Z warnings.warn( 2022-12-01T10:46:50.8809460Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8809571Z warnings.warn( 2022-12-01T10:46:50.8810193Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8810303Z warnings.warn( 2022-12-01T10:46:50.8810897Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8811079Z warnings.warn( 2022-12-01T10:46:50.8811194Z dist init r=1, world=2 2022-12-01T10:46:50.8811306Z dist init r=0, world=2 2022-12-01T10:46:50.8811405Z ok (3.411s) 2022-12-01T10:46:50.8811805Z test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25097 2022-12-01T10:46:50.8812022Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25098 2022-12-01T10:46:50.8812397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8812554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8812932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8813128Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8813494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8813668Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8814038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8814224Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8814468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8814715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8815097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8815501Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8815794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8816030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8816786Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8816901Z warnings.warn( 2022-12-01T10:46:50.8817655Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8817769Z warnings.warn( 2022-12-01T10:46:50.8817880Z dist init r=0, world=2 2022-12-01T10:46:50.8817977Z dist init r=1, world=2 2022-12-01T10:46:50.8818078Z ok (3.411s) 2022-12-01T10:46:50.8818478Z test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25176 2022-12-01T10:46:50.8818698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25177 2022-12-01T10:46:50.8819065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8819239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8819615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8819864Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8820235Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8820391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8820767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8820955Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8821202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8821446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8821845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8822277Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8822517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8822743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8823727Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8823823Z warnings.warn( 2022-12-01T10:46:50.8824848Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8824973Z warnings.warn( 2022-12-01T10:46:50.8825598Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8825706Z warnings.warn( 2022-12-01T10:46:50.8826317Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8826425Z warnings.warn( 2022-12-01T10:46:50.8826538Z dist init r=1, world=2 2022-12-01T10:46:50.8826653Z dist init r=0, world=2 2022-12-01T10:46:50.8826754Z ok (3.411s) 2022-12-01T10:46:50.8827137Z test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25255 2022-12-01T10:46:50.8827358Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25256 2022-12-01T10:46:50.8827731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8827906Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8828279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8828468Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8828831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8829062Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8829429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8829619Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8829868Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8830124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8830522Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8830918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8831145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8831375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8832130Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8832227Z warnings.warn( 2022-12-01T10:46:50.8832982Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8833093Z warnings.warn( 2022-12-01T10:46:50.8833203Z dist init r=1, world=2 2022-12-01T10:46:50.8833314Z dist init r=0, world=2 2022-12-01T10:46:50.8833413Z ok (3.511s) 2022-12-01T10:46:50.8833853Z test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_False (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25334 2022-12-01T10:46:50.8834217Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25335 2022-12-01T10:46:50.8834592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8834765Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8835142Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8835331Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8835690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8835864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8836222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8836413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8836657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8836900Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8837298Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8837690Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8837916Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8838203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8839186Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8839299Z warnings.warn( 2022-12-01T10:46:50.8840257Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8840371Z warnings.warn( 2022-12-01T10:46:50.8840995Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8841104Z warnings.warn( 2022-12-01T10:46:50.8841719Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8841826Z warnings.warn( 2022-12-01T10:46:50.8841936Z dist init r=1, world=2 2022-12-01T10:46:50.8842042Z dist init r=0, world=2 2022-12-01T10:46:50.8842140Z ok (3.412s) 2022-12-01T10:46:50.8842735Z test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25413 2022-12-01T10:46:50.8843047Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25414 2022-12-01T10:46:50.8843445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8843621Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8843995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8844184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8844550Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8844723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8845100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8845275Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8845524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8845767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8846168Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8846561Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8846788Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8847013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8847851Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8847965Z warnings.warn( 2022-12-01T10:46:50.8848716Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8848808Z warnings.warn( 2022-12-01T10:46:50.8848918Z dist init r=1, world=2 2022-12-01T10:46:50.8849027Z dist init r=0, world=2 2022-12-01T10:46:50.8849125Z ok (3.411s) 2022-12-01T10:46:50.8849513Z test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_False (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25492 2022-12-01T10:46:50.8849739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25493 2022-12-01T10:46:50.8850112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8850285Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8850645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8850834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8851196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8851366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8851735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8851921Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8852223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8852476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8852878Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8853258Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8853486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8853712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8854693Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8854806Z warnings.warn( 2022-12-01T10:46:50.8855778Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8855888Z warnings.warn( 2022-12-01T10:46:50.8856504Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8856684Z warnings.warn( 2022-12-01T10:46:50.8857304Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:46:50.8857413Z warnings.warn( 2022-12-01T10:46:50.8857507Z dist init r=1, world=2 2022-12-01T10:46:50.8857617Z dist init r=0, world=2 2022-12-01T10:46:50.8857716Z ok (3.411s) 2022-12-01T10:46:50.8858103Z test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25571 2022-12-01T10:46:50.8858321Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25572 2022-12-01T10:46:50.8858698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8858878Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8859260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8859432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8859795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8859968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8860337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8860525Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8860774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8861065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8861482Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8861878Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8862088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8862312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8863066Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8863183Z warnings.warn( 2022-12-01T10:46:50.8863933Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1183: UserWarning: Trying to ignore the top-level module passed into the FSDP constructor itself will result in all parameters being ignored and is not well-supported: Linear(in_features=4, out_features=4, bias=True) 2022-12-01T10:46:50.8864042Z warnings.warn( 2022-12-01T10:46:50.8864153Z dist init r=1, world=2 2022-12-01T10:46:50.8864262Z dist init r=0, world=2 2022-12-01T10:46:50.8864361Z ok (3.413s) 2022-12-01T10:46:50.8864650Z test_wrong_state_dict_config (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25650 2022-12-01T10:46:50.8864867Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25651 2022-12-01T10:46:50.8865302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8865475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8865854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8866043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8866406Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:46:50.8866578Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:46:50.8866954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:46:50.8867127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:46:50.8867370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:46:50.8867618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:46:50.8868019Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8868417Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:46:50.8868642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:46:50.8868866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:46:50.8869845Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8869963Z warnings.warn( 2022-12-01T10:46:50.8870986Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:46:50.8871110Z warnings.warn( 2022-12-01T10:46:50.8871204Z dist init r=0, world=2 2022-12-01T10:46:50.8871312Z dist init r=1, world=2 2022-12-01T10:46:50.8871410Z ok (3.410s) 2022-12-01T10:46:50.8871434Z 2022-12-01T10:46:50.8871706Z ---------------------------------------------------------------------- 2022-12-01T10:46:50.8871826Z Ran 70 tests in 255.274s 2022-12-01T10:46:50.8871845Z 2022-12-01T10:46:50.8871935Z OK 2022-12-01T10:46:50.8871953Z 2022-12-01T10:46:50.8872078Z Generating XML reports... 2022-12-01T10:46:50.8872532Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_state_dict/TEST-TestFSDPStateDict-20221201104235.xml 2022-12-01T10:46:50.8872553Z 2022-12-01T10:46:50.8873001Z ##[endgroup] 2022-12-01T10:46:50.8873488Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_state_dict (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_state_dict_4w1ggyy6) 2022-12-01T10:46:50.8873508Z 2022-12-01T10:46:50.8873784Z Running distributed/fsdp/test_fsdp_optim_state ... [2022-12-01 10:46:50.804090] 2022-12-01T10:46:50.8874268Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_optim_state.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:46:50.804528] 2022-12-01T10:50:23.4906817Z 2022-12-01T10:50:23.4909817Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_optim_state 2022-12-01T10:50:23.4911154Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_optim_state (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_optim_state_v66rnijf) 2022-12-01T10:50:23.4911572Z 2022-12-01T10:50:23.4911688Z Running tests... 2022-12-01T10:50:23.4912212Z ---------------------------------------------------------------------- 2022-12-01T10:50:23.4916338Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_optim_state 2022-12-01T10:50:23.4916876Z test_flatten_sharded_optim_state_dict_nested (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.4917487Z Tests :meth:`flatten_sharded_optim_state_dict` for an FSDP-root ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:50:23.4917979Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25764 2022-12-01T10:50:23.4918490Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25765 2022-12-01T10:50:23.4919144Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4919614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4920205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4920666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4921256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4921708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4922281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4923114Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4923572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.4924084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.4924951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4925676Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4926205Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.4927135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.4928043Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.4928710Z warnings.warn( 2022-12-01T10:50:23.4929491Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.4930048Z warnings.warn( 2022-12-01T10:50:23.4930819Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.4931381Z warnings.warn( 2022-12-01T10:50:23.4932114Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.4932658Z warnings.warn( 2022-12-01T10:50:23.4933725Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.4934985Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.4936161Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.4936733Z warnings.warn( 2022-12-01T10:50:23.4937507Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.4938068Z warnings.warn( 2022-12-01T10:50:23.4938322Z dist init r=0, world=2 2022-12-01T10:50:23.4938555Z dist init r=1, world=2 2022-12-01T10:50:23.4938799Z ok (5.793s) 2022-12-01T10:50:23.4939153Z test_flatten_sharded_optim_state_dict_transformer (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.4939850Z Tests :meth:`flatten_sharded_optim_state_dict` for an FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25847 2022-12-01T10:50:23.4940376Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25848 2022-12-01T10:50:23.4940996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4941457Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4942269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4942829Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4943426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4943884Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4944461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4944939Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4945389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.4945893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.4946562Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4947254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4947792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.4948279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.4949165Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.4949717Z warnings.warn( 2022-12-01T10:50:23.4950458Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.4951080Z warnings.warn( 2022-12-01T10:50:23.4951865Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.4952429Z warnings.warn( 2022-12-01T10:50:23.4953172Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.4953725Z warnings.warn( 2022-12-01T10:50:23.4953973Z dist init r=1, world=2 2022-12-01T10:50:23.4954212Z dist init r=0, world=2 2022-12-01T10:50:23.4954453Z ok (5.114s) 2022-12-01T10:50:23.4954786Z test_full_optim_state_dict_keys (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.4955266Z Tests that the parameter keys returned by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25930 2022-12-01T10:50:23.4955792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25931 2022-12-01T10:50:23.4956417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4956875Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4957440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4957919Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4958510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4958959Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4959527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4960137Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4960611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.4961113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.4961762Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4962797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4963334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.4963791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.4964677Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.4965232Z warnings.warn( 2022-12-01T10:50:23.4965982Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.4966523Z warnings.warn( 2022-12-01T10:50:23.4967274Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.4967826Z warnings.warn( 2022-12-01T10:50:23.4968583Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.4969248Z warnings.warn( 2022-12-01T10:50:23.4969482Z dist init r=0, world=2 2022-12-01T10:50:23.4969732Z dist init r=1, world=2 2022-12-01T10:50:23.4969972Z ok (4.011s) 2022-12-01T10:50:23.4970285Z test_full_optim_state_dict_nested_invalid (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.4970805Z Tests that :meth:`full_optim_state_dict` raises an error when ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26013 2022-12-01T10:50:23.4971328Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26014 2022-12-01T10:50:23.4971930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4972377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4972957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4973435Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4973994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.4974438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.4975007Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.4975450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.4975900Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.4976396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.4997058Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4997943Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.4998493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.4998975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.4999867Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5000428Z warnings.warn( 2022-12-01T10:50:23.5001178Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5001733Z warnings.warn( 2022-12-01T10:50:23.5002914Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5003482Z warnings.warn( 2022-12-01T10:50:23.5004244Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5004800Z warnings.warn( 2022-12-01T10:50:23.5005050Z dist init r=0, world=2 2022-12-01T10:50:23.5005287Z dist init r=1, world=2 2022-12-01T10:50:23.5005528Z ok (4.011s) 2022-12-01T10:50:23.5005837Z test_optim_input_warning (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5006488Z Tests that passing the ``optim_input`` argument into optimizer state ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26096 2022-12-01T10:50:23.5007056Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26097 2022-12-01T10:50:23.5007681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5008137Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5008701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5009178Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5009758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5010204Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5010759Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5011226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5011686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5012176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5012838Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5013533Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5014061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5014517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5015475Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5016050Z warnings.warn( 2022-12-01T10:50:23.5016815Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5017343Z warnings.warn( 2022-12-01T10:50:23.5018110Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5018665Z warnings.warn( 2022-12-01T10:50:23.5019429Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5019962Z warnings.warn( 2022-12-01T10:50:23.5020212Z dist init r=1, world=2 2022-12-01T10:50:23.5020469Z dist init r=0, world=2 2022-12-01T10:50:23.5020691Z ok (4.112s) 2022-12-01T10:50:23.5021173Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5021839Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26179 2022-12-01T10:50:23.5022373Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26180 2022-12-01T10:50:23.5022974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5023508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5024094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5024568Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5025135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5025587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5026155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5026587Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5027016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5027497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5028140Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5028813Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5029341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5029813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5030686Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5031217Z warnings.warn( 2022-12-01T10:50:23.5031982Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5032527Z warnings.warn( 2022-12-01T10:50:23.5033381Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5033936Z warnings.warn( 2022-12-01T10:50:23.5034708Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5035255Z warnings.warn( 2022-12-01T10:50:23.5036048Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5036594Z warnings.warn( 2022-12-01T10:50:23.5037381Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5037937Z warnings.warn( 2022-12-01T10:50:23.5038183Z dist init r=1, world=2 2022-12-01T10:50:23.5038419Z dist init r=0, world=2 2022-12-01T10:50:23.5038655Z ok (4.111s) 2022-12-01T10:50:23.5039137Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5039782Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26262 2022-12-01T10:50:23.5040394Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26263 2022-12-01T10:50:23.5041016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5041470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5042032Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5042834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5043433Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5043860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5044437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5044908Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5045364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5045854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5046517Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5047207Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5047736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5048189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5049051Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5049608Z warnings.warn( 2022-12-01T10:50:23.5050466Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5051011Z warnings.warn( 2022-12-01T10:50:23.5051789Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5052344Z warnings.warn( 2022-12-01T10:50:23.5053111Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5053641Z warnings.warn( 2022-12-01T10:50:23.5054439Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5055002Z warnings.warn( 2022-12-01T10:50:23.5055793Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5056327Z warnings.warn( 2022-12-01T10:50:23.5056578Z dist init r=0, world=2 2022-12-01T10:50:23.5056833Z dist init r=1, world=2 2022-12-01T10:50:23.5057054Z ok (4.114s) 2022-12-01T10:50:23.5057533Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5058307Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26345 2022-12-01T10:50:23.5058843Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26346 2022-12-01T10:50:23.5059445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5059899Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5060477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5060948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5061510Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5061957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5062531Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5062982Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5063437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5063939Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5064595Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5065266Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5065794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5066270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5067206Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5067763Z warnings.warn( 2022-12-01T10:50:23.5068530Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5069072Z warnings.warn( 2022-12-01T10:50:23.5069848Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5070386Z warnings.warn( 2022-12-01T10:50:23.5071162Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5071713Z warnings.warn( 2022-12-01T10:50:23.5072506Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5073048Z warnings.warn( 2022-12-01T10:50:23.5073833Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5074465Z warnings.warn( 2022-12-01T10:50:23.5074715Z dist init r=1, world=2 2022-12-01T10:50:23.5074951Z dist init r=0, world=2 2022-12-01T10:50:23.5075194Z ok (4.111s) 2022-12-01T10:50:23.5075673Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5076318Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26428 2022-12-01T10:50:23.5076848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26429 2022-12-01T10:50:23.5077467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5077922Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5078477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5078926Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5079488Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5079908Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5080479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5080938Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5081392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5081872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5082847Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5083567Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5084164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5084639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5085509Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5086060Z warnings.warn( 2022-12-01T10:50:23.5086816Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5087336Z warnings.warn( 2022-12-01T10:50:23.5088112Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5088658Z warnings.warn( 2022-12-01T10:50:23.5089400Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5089940Z warnings.warn( 2022-12-01T10:50:23.5090728Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5091285Z warnings.warn( 2022-12-01T10:50:23.5092072Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5092677Z warnings.warn( 2022-12-01T10:50:23.5092908Z dist init r=1, world=2 2022-12-01T10:50:23.5093136Z dist init r=0, world=2 2022-12-01T10:50:23.5093345Z ok (4.111s) 2022-12-01T10:50:23.5093827Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5094484Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26511 2022-12-01T10:50:23.5095013Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26512 2022-12-01T10:50:23.5095613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5096070Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5096646Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5097095Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5097664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5098081Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5098624Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5099058Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5099485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5099969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5100676Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5101360Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5101863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5102313Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5103150Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5103676Z warnings.warn( 2022-12-01T10:50:23.5104414Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5104942Z warnings.warn( 2022-12-01T10:50:23.5105691Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5106215Z warnings.warn( 2022-12-01T10:50:23.5107000Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5107533Z warnings.warn( 2022-12-01T10:50:23.5108300Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5108929Z warnings.warn( 2022-12-01T10:50:23.5109695Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5110232Z warnings.warn( 2022-12-01T10:50:23.5110451Z dist init r=0, world=2 2022-12-01T10:50:23.5110680Z dist init r=1, world=2 2022-12-01T10:50:23.5110895Z ok (4.111s) 2022-12-01T10:50:23.5111345Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5111986Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26594 2022-12-01T10:50:23.5112498Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26595 2022-12-01T10:50:23.5113094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5113518Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5114073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5114528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5115082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5115492Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5116034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5116475Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5116969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5117463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5118098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5118763Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5119257Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5119705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5120549Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5121088Z warnings.warn( 2022-12-01T10:50:23.5121817Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5122332Z warnings.warn( 2022-12-01T10:50:23.5123334Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5123862Z warnings.warn( 2022-12-01T10:50:23.5124599Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5125235Z warnings.warn( 2022-12-01T10:50:23.5126017Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5126545Z warnings.warn( 2022-12-01T10:50:23.5127289Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5127811Z warnings.warn( 2022-12-01T10:50:23.5128045Z dist init r=0, world=2 2022-12-01T10:50:23.5128279Z dist init r=1, world=2 2022-12-01T10:50:23.5128489Z ok (4.211s) 2022-12-01T10:50:23.5128947Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5129600Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26677 2022-12-01T10:50:23.5130105Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26678 2022-12-01T10:50:23.5130696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5131125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5131677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5132114Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5132670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5133093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5133717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5134175Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5134602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5135080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5135712Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5136382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5136880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5137336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5138176Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5138707Z warnings.warn( 2022-12-01T10:50:23.5139443Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5139962Z warnings.warn( 2022-12-01T10:50:23.5140701Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5141307Z warnings.warn( 2022-12-01T10:50:23.5142056Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5142586Z warnings.warn( 2022-12-01T10:50:23.5143347Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5143883Z warnings.warn( 2022-12-01T10:50:23.5144649Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5145187Z warnings.warn( 2022-12-01T10:50:23.5145404Z dist init r=0, world=2 2022-12-01T10:50:23.5145628Z dist init r=1, world=2 2022-12-01T10:50:23.5145842Z ok (4.111s) 2022-12-01T10:50:23.5146292Z test_optim_state_dict_nested_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5146932Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26760 2022-12-01T10:50:23.5147441Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26761 2022-12-01T10:50:23.5148033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5148454Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5149008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5149454Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5150090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5150521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5151070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5151509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5151934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5152416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5153055Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5153723Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5154218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5154665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5155513Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5156038Z warnings.warn( 2022-12-01T10:50:23.5156762Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5157361Z warnings.warn( 2022-12-01T10:50:23.5158115Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5158646Z warnings.warn( 2022-12-01T10:50:23.5159384Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5159902Z warnings.warn( 2022-12-01T10:50:23.5160665Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5161197Z warnings.warn( 2022-12-01T10:50:23.5161959Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5162803Z warnings.warn( 2022-12-01T10:50:23.5163034Z dist init r=0, world=2 2022-12-01T10:50:23.5163257Z dist init r=1, world=2 2022-12-01T10:50:23.5163473Z ok (4.111s) 2022-12-01T10:50:23.5163938Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5164584Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26843 2022-12-01T10:50:23.5165083Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26844 2022-12-01T10:50:23.5165690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5166116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5166758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5167220Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5167777Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5168195Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5168735Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5169176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5169606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5170094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5170730Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5171404Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5171906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5172358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5173198Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5173817Z warnings.warn( 2022-12-01T10:50:23.5174561Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5175077Z warnings.warn( 2022-12-01T10:50:23.5175818Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5176347Z warnings.warn( 2022-12-01T10:50:23.5177086Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5177600Z warnings.warn( 2022-12-01T10:50:23.5178361Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5178899Z warnings.warn( 2022-12-01T10:50:23.5179654Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5180172Z warnings.warn( 2022-12-01T10:50:23.5181029Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5182311Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5182928Z dist init r=0, world=2 2022-12-01T10:50:23.5183157Z dist init r=1, world=2 2022-12-01T10:50:23.5183367Z ok (4.211s) 2022-12-01T10:50:23.5183828Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5184469Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26926 2022-12-01T10:50:23.5184977Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26927 2022-12-01T10:50:23.5185563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5185996Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5186552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5187005Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5187559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5187977Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5188523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5188961Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5189394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5189955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5190602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5191260Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5191763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5192215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5193058Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5193582Z warnings.warn( 2022-12-01T10:50:23.5194322Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5194852Z warnings.warn( 2022-12-01T10:50:23.5195603Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5196125Z warnings.warn( 2022-12-01T10:50:23.5196870Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5197397Z warnings.warn( 2022-12-01T10:50:23.5198164Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5198694Z warnings.warn( 2022-12-01T10:50:23.5199521Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5200072Z warnings.warn( 2022-12-01T10:50:23.5200939Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5202151Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5202935Z dist init r=0, world=2 2022-12-01T10:50:23.5203165Z dist init r=1, world=2 2022-12-01T10:50:23.5203380Z ok (4.211s) 2022-12-01T10:50:23.5203837Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5204483Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27009 2022-12-01T10:50:23.5204989Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27010 2022-12-01T10:50:23.5205585Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5206132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5206728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5207180Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5207734Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5208154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5208695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5209135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5209557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5210033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5210670Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5211339Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5211834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5212280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5212611Z dist init r=0, world=2 2022-12-01T10:50:23.5212832Z dist init r=1, world=2 2022-12-01T10:50:23.5213045Z ok (3.410s) 2022-12-01T10:50:23.5213507Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5214144Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27088 2022-12-01T10:50:23.5214653Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27089 2022-12-01T10:50:23.5215331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5215826Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5216376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5216838Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5217415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5217857Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5218407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5218878Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5219337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5219840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5220488Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5221181Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5221705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5222162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5222520Z dist init r=0, world=2 2022-12-01T10:50:23.5222860Z dist init r=1, world=2 2022-12-01T10:50:23.5223106Z ok (3.410s) 2022-12-01T10:50:23.5223580Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5224253Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27167 2022-12-01T10:50:23.5224784Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27168 2022-12-01T10:50:23.5225383Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5225836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5226408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5226876Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5227437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5227888Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5228452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5228912Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5229347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5229852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5230513Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5231178Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5232119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5232670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5233576Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5234135Z warnings.warn( 2022-12-01T10:50:23.5234883Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5235431Z warnings.warn( 2022-12-01T10:50:23.5236205Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5236771Z warnings.warn( 2022-12-01T10:50:23.5237524Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5238078Z warnings.warn( 2022-12-01T10:50:23.5238879Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5239441Z warnings.warn( 2022-12-01T10:50:23.5240205Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5240844Z warnings.warn( 2022-12-01T10:50:23.5241746Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5243266Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5243871Z dist init r=1, world=2 2022-12-01T10:50:23.5244132Z dist init r=0, world=2 2022-12-01T10:50:23.5244380Z ok (4.211s) 2022-12-01T10:50:23.5244868Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5245525Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27250 2022-12-01T10:50:23.5246062Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27251 2022-12-01T10:50:23.5246678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5247114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5247691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5248163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5248746Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5249169Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5249828Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5250315Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5250771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5251256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5251925Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5252618Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5253130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5253611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5254481Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5255034Z warnings.warn( 2022-12-01T10:50:23.5255780Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5256326Z warnings.warn( 2022-12-01T10:50:23.5257095Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5257745Z warnings.warn( 2022-12-01T10:50:23.5258502Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5259047Z warnings.warn( 2022-12-01T10:50:23.5259837Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5260396Z warnings.warn( 2022-12-01T10:50:23.5261163Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5261715Z warnings.warn( 2022-12-01T10:50:23.5262606Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5263846Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5264467Z dist init r=0, world=2 2022-12-01T10:50:23.5264702Z dist init r=1, world=2 2022-12-01T10:50:23.5264942Z ok (4.111s) 2022-12-01T10:50:23.5265431Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5266185Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27333 2022-12-01T10:50:23.5266719Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27334 2022-12-01T10:50:23.5267344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5267801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5268349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5268803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5269382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5269862Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5270437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5270908Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5271364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5271871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5272518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5273209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5273733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5274269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5274631Z dist init r=0, world=2 2022-12-01T10:50:23.5274894Z dist init r=1, world=2 2022-12-01T10:50:23.5275120Z ok (3.410s) 2022-12-01T10:50:23.5275611Z test_optim_state_dict_nested_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True_rank0_only_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5276276Z Tests :meth:`full_optim_state_dict` and `sharded_optim_state_dict` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27412 2022-12-01T10:50:23.5276806Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27413 2022-12-01T10:50:23.5277407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5277863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5278445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5278915Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5279476Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5279929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5280502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5280966Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5281401Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5281905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5282742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5283521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5284074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5284551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5284913Z dist init r=0, world=2 2022-12-01T10:50:23.5285152Z dist init r=1, world=2 2022-12-01T10:50:23.5285395Z ok (3.410s) 2022-12-01T10:50:23.5285837Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5286434Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27491 2022-12-01T10:50:23.5286965Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27492 2022-12-01T10:50:23.5287591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5288048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5288607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5289081Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5289664Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5290089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5290663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5291222Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5291678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5292169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5292838Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5293534Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5294067Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5294524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5295393Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5295956Z warnings.warn( 2022-12-01T10:50:23.5296722Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5297251Z warnings.warn( 2022-12-01T10:50:23.5298024Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5298582Z warnings.warn( 2022-12-01T10:50:23.5299349Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5299880Z warnings.warn( 2022-12-01T10:50:23.5300748Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5301327Z warnings.warn( 2022-12-01T10:50:23.5302117Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5302650Z warnings.warn( 2022-12-01T10:50:23.5303540Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5304157Z dist init r=0, world=2 2022-12-01T10:50:23.5304413Z dist init r=1, world=2 2022-12-01T10:50:23.5304639Z ok (4.111s) 2022-12-01T10:50:23.5305083Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_FULL_STATE_DICT_use_multiple_param_groups_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5305696Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27574 2022-12-01T10:50:23.5306205Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27575 2022-12-01T10:50:23.5306857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5307311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5307891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5308446Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5309037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5309485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5310058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5310505Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5310959Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5311468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5312111Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5312805Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5313329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5313803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5314654Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5315205Z warnings.warn( 2022-12-01T10:50:23.5315961Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5316511Z warnings.warn( 2022-12-01T10:50:23.5317328Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5317900Z warnings.warn( 2022-12-01T10:50:23.5318669Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5319223Z warnings.warn( 2022-12-01T10:50:23.5319996Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5320556Z warnings.warn( 2022-12-01T10:50:23.5321344Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5321904Z warnings.warn( 2022-12-01T10:50:23.5323009Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5323635Z dist init r=0, world=2 2022-12-01T10:50:23.5323893Z dist init r=1, world=2 2022-12-01T10:50:23.5324135Z ok (4.211s) 2022-12-01T10:50:23.5324558Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5325290Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27657 2022-12-01T10:50:23.5325822Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27658 2022-12-01T10:50:23.5326429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5326890Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5327467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5327940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5328500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5328945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5329516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5329989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5330431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5330940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5331599Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5332270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5332794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5333269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5334231Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5334789Z warnings.warn( 2022-12-01T10:50:23.5335557Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5336104Z warnings.warn( 2022-12-01T10:50:23.5336881Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5337415Z warnings.warn( 2022-12-01T10:50:23.5338179Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5338736Z warnings.warn( 2022-12-01T10:50:23.5339535Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5340085Z warnings.warn( 2022-12-01T10:50:23.5340879Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5341439Z warnings.warn( 2022-12-01T10:50:23.5342329Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5343664Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5344255Z dist init r=0, world=2 2022-12-01T10:50:23.5344514Z dist init r=1, world=2 2022-12-01T10:50:23.5344760Z ok (4.211s) 2022-12-01T10:50:23.5345182Z test_rekey_optim_state_dict_to_ids_state_dict_type_StateDictType_SHARDED_STATE_DICT_use_multiple_param_groups_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5345804Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27740 2022-12-01T10:50:23.5346341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27741 2022-12-01T10:50:23.5346965Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5347403Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5347985Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5348461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5349041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5349467Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5350038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5350510Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5350946Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5351522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5352209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5352910Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5353419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5353898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5354771Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5355333Z warnings.warn( 2022-12-01T10:50:23.5356080Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5356631Z warnings.warn( 2022-12-01T10:50:23.5357405Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5357963Z warnings.warn( 2022-12-01T10:50:23.5358711Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5359339Z warnings.warn( 2022-12-01T10:50:23.5360141Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5360710Z warnings.warn( 2022-12-01T10:50:23.5361471Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5362023Z warnings.warn( 2022-12-01T10:50:23.5363156Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5364421Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5365033Z dist init r=0, world=2 2022-12-01T10:50:23.5365270Z dist init r=1, world=2 2022-12-01T10:50:23.5365512Z ok (4.211s) 2022-12-01T10:50:23.5365840Z test_rekey_optim_state_dict_to_names (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5366336Z Tests :meth:`rekey_optim_state_dict` with the new keys being ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27823 2022-12-01T10:50:23.5366865Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27824 2022-12-01T10:50:23.5367480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5367941Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5368595Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5369088Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5369676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5370105Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5370676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5371142Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5371598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5372092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5372756Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5373455Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5373982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5374436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5375304Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5375854Z warnings.warn( 2022-12-01T10:50:23.5376722Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5377256Z warnings.warn( 2022-12-01T10:50:23.5378030Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5378579Z warnings.warn( 2022-12-01T10:50:23.5379351Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5379873Z warnings.warn( 2022-12-01T10:50:23.5380670Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5381235Z warnings.warn( 2022-12-01T10:50:23.5382030Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5382559Z warnings.warn( 2022-12-01T10:50:23.5382811Z dist init r=0, world=2 2022-12-01T10:50:23.5383068Z dist init r=1, world=2 2022-12-01T10:50:23.5383290Z ok (4.211s) 2022-12-01T10:50:23.5383648Z test_scatter_full_optim_state_dict_nested_halve_world_size (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5384344Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27906 2022-12-01T10:50:23.5384894Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27907 2022-12-01T10:50:23.5385564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5386038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5386625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5387078Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5387657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5388102Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5388675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5389131Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5389588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5390098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5390745Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5391445Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5391972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5392450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5393302Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5393941Z warnings.warn( 2022-12-01T10:50:23.5394706Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5395254Z warnings.warn( 2022-12-01T10:50:23.5396007Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5396560Z warnings.warn( 2022-12-01T10:50:23.5397321Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5397879Z warnings.warn( 2022-12-01T10:50:23.5398244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:50:23.5398755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:50:23.5399413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5400107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5401042Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5401601Z warnings.warn( 2022-12-01T10:50:23.5402700Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5403302Z warnings.warn( 2022-12-01T10:50:23.5403669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:50:23.5404175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:50:23.5404843Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5405895Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5406829Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5407247Z dist init r=0, world=2 2022-12-01T10:50:23.5407505Z dist init r=1, world=2 2022-12-01T10:50:23.5407749Z ok (4.311s) 2022-12-01T10:50:23.5408175Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5408933Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27999 2022-12-01T10:50:23.5409471Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28000 2022-12-01T10:50:23.5410058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5410517Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5411214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5411698Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5412262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5412720Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5413299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5413772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5414212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5414708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5415367Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5416044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5416571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5417046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5417919Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5418481Z warnings.warn( 2022-12-01T10:50:23.5419224Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5419765Z warnings.warn( 2022-12-01T10:50:23.5420605Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5421179Z warnings.warn( 2022-12-01T10:50:23.5421938Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5422493Z warnings.warn( 2022-12-01T10:50:23.5423400Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5424662Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5425907Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5427176Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5428419Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5428967Z warnings.warn( 2022-12-01T10:50:23.5429749Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5430306Z warnings.warn( 2022-12-01T10:50:23.5430557Z dist init r=1, world=2 2022-12-01T10:50:23.5430789Z dist init r=0, world=2 2022-12-01T10:50:23.5431029Z ok (4.411s) 2022-12-01T10:50:23.5431465Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5432211Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28082 2022-12-01T10:50:23.5432750Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28083 2022-12-01T10:50:23.5433356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5433810Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5434363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5434835Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5435413Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5435844Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5436432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5436995Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5437464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5437950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5438632Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5439346Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5439879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5440361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5441233Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5441809Z warnings.warn( 2022-12-01T10:50:23.5442806Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5443376Z warnings.warn( 2022-12-01T10:50:23.5444140Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5444710Z warnings.warn( 2022-12-01T10:50:23.5445601Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5446170Z warnings.warn( 2022-12-01T10:50:23.5447038Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5448302Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5449569Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5450803Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5451958Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5452523Z warnings.warn( 2022-12-01T10:50:23.5453397Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5453955Z warnings.warn( 2022-12-01T10:50:23.5454207Z dist init r=0, world=2 2022-12-01T10:50:23.5454459Z dist init r=1, world=2 2022-12-01T10:50:23.5454685Z ok (4.412s) 2022-12-01T10:50:23.5455123Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5455912Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28165 2022-12-01T10:50:23.5458241Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28166 2022-12-01T10:50:23.5458878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5459367Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5459952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5460414Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5460981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5461461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5462054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5462532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5462973Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5463605Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5464278Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5464976Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5465488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5465966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5466838Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5467397Z warnings.warn( 2022-12-01T10:50:23.5468138Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5468693Z warnings.warn( 2022-12-01T10:50:23.5469475Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5470023Z warnings.warn( 2022-12-01T10:50:23.5470772Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5471322Z warnings.warn( 2022-12-01T10:50:23.5472119Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5472682Z warnings.warn( 2022-12-01T10:50:23.5473515Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5474097Z warnings.warn( 2022-12-01T10:50:23.5474993Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5476235Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5476865Z dist init r=0, world=2 2022-12-01T10:50:23.5477103Z dist init r=1, world=2 2022-12-01T10:50:23.5477346Z ok (4.211s) 2022-12-01T10:50:23.5477784Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5478532Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28248 2022-12-01T10:50:23.5479071Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28249 2022-12-01T10:50:23.5479683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5480140Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5480786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5481265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5481848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5482273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5483191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5483660Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5484117Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5484601Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5485264Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5485963Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5486492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5486951Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5487814Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5488368Z warnings.warn( 2022-12-01T10:50:23.5489135Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5489661Z warnings.warn( 2022-12-01T10:50:23.5490555Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5491134Z warnings.warn( 2022-12-01T10:50:23.5491911Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5492436Z warnings.warn( 2022-12-01T10:50:23.5493231Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5493797Z warnings.warn( 2022-12-01T10:50:23.5494588Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5495119Z warnings.warn( 2022-12-01T10:50:23.5496009Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5497249Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5497969Z dist init r=0, world=2 2022-12-01T10:50:23.5498205Z dist init r=1, world=2 2022-12-01T10:50:23.5498448Z ok (4.311s) 2022-12-01T10:50:23.5498896Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5499668Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28331 2022-12-01T10:50:23.5500193Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28332 2022-12-01T10:50:23.5500806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5501261Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5501823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5502304Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5502894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5503345Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5503901Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5504375Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5504838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5505349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5505989Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5506721Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5507315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5507786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5508669Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5509221Z warnings.warn( 2022-12-01T10:50:23.5509982Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5510510Z warnings.warn( 2022-12-01T10:50:23.5511287Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5511848Z warnings.warn( 2022-12-01T10:50:23.5512614Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5513137Z warnings.warn( 2022-12-01T10:50:23.5514028Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5515369Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5516608Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5517837Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5519007Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5519570Z warnings.warn( 2022-12-01T10:50:23.5520336Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5520900Z warnings.warn( 2022-12-01T10:50:23.5521149Z dist init r=1, world=2 2022-12-01T10:50:23.5521388Z dist init r=0, world=2 2022-12-01T10:50:23.5521626Z ok (4.412s) 2022-12-01T10:50:23.5522059Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5523082Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28414 2022-12-01T10:50:23.5523616Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28415 2022-12-01T10:50:23.5524320Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5524797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5525360Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5525834Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5526416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5526860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5527412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5527883Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5528342Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5528827Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5529493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5530187Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5530711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5531164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5532038Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5532697Z warnings.warn( 2022-12-01T10:50:23.5533467Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5533989Z warnings.warn( 2022-12-01T10:50:23.5534758Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5535312Z warnings.warn( 2022-12-01T10:50:23.5536075Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5536598Z warnings.warn( 2022-12-01T10:50:23.5537492Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5538732Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5539975Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5541278Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5542460Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5543025Z warnings.warn( 2022-12-01T10:50:23.5543798Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5544360Z warnings.warn( 2022-12-01T10:50:23.5544619Z dist init r=0, world=2 2022-12-01T10:50:23.5544854Z dist init r=1, world=2 2022-12-01T10:50:23.5545094Z ok (4.311s) 2022-12-01T10:50:23.5545531Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5546271Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28497 2022-12-01T10:50:23.5546810Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28498 2022-12-01T10:50:23.5547418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5547867Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5548511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5548987Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5549567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5550012Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5550559Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5551027Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5551484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5551969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5552637Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5553341Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5553871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5554328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5555190Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5555744Z warnings.warn( 2022-12-01T10:50:23.5556503Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5557037Z warnings.warn( 2022-12-01T10:50:23.5557869Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5558440Z warnings.warn( 2022-12-01T10:50:23.5559215Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5559737Z warnings.warn( 2022-12-01T10:50:23.5560527Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5561091Z warnings.warn( 2022-12-01T10:50:23.5561875Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5562650Z warnings.warn( 2022-12-01T10:50:23.5563563Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5564808Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5565538Z dist init r=1, world=2 2022-12-01T10:50:23.5565795Z dist init r=0, world=2 2022-12-01T10:50:23.5566019Z ok (4.311s) 2022-12-01T10:50:23.5566461Z test_scatter_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5567223Z Tests :meth:`scatter_full_optim_state_dict` for a non-FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28580 2022-12-01T10:50:23.5567739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28581 2022-12-01T10:50:23.5568349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5568805Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5569384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5569841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5570425Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5570876Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5571426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5571890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5572344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5572851Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5573492Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5574191Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5574802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5575298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5576161Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5576713Z warnings.warn( 2022-12-01T10:50:23.5577473Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5578025Z warnings.warn( 2022-12-01T10:50:23.5578782Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5579337Z warnings.warn( 2022-12-01T10:50:23.5580113Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5580663Z warnings.warn( 2022-12-01T10:50:23.5581441Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5582003Z warnings.warn( 2022-12-01T10:50:23.5582794Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5583435Z warnings.warn( 2022-12-01T10:50:23.5584312Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5585566Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5586176Z dist init r=1, world=2 2022-12-01T10:50:23.5586433Z dist init r=0, world=2 2022-12-01T10:50:23.5586658Z ok (4.211s) 2022-12-01T10:50:23.5587004Z test_scatter_full_optim_state_dict_transformer (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5587674Z Tests :meth:`scatter_full_optim_state_dict` for an FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28663 2022-12-01T10:50:23.5588211Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28664 2022-12-01T10:50:23.5588804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5589259Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5589837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5590293Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5590875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5591323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5591957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5592424Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5592882Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5593384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5594051Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5594727Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5595259Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5595734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5596612Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5597151Z warnings.warn( 2022-12-01T10:50:23.5597911Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5598452Z warnings.warn( 2022-12-01T10:50:23.5599225Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5599832Z warnings.warn( 2022-12-01T10:50:23.5600602Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5601157Z warnings.warn( 2022-12-01T10:50:23.5601522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:50:23.5602029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:50:23.5602888Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5603585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5604626Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5605877Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5606647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:50:23.5607152Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:50:23.5607811Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5608486Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5609000Z dist init r=0, world=2 2022-12-01T10:50:23.5609281Z dist init r=1, world=2 2022-12-01T10:50:23.5609504Z ok (4.712s) 2022-12-01T10:50:23.5609860Z test_shard_full_optim_state_dict_nested_halve_world_size (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5610559Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28756 2022-12-01T10:50:23.5611103Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28757 2022-12-01T10:50:23.5611697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5612149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5612730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5613206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5613770Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5614217Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5614789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5615239Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5615696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5616198Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5616853Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5617628Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5618154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5618621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5619491Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5620020Z warnings.warn( 2022-12-01T10:50:23.5620783Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5621335Z warnings.warn( 2022-12-01T10:50:23.5622113Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5622643Z warnings.warn( 2022-12-01T10:50:23.5623411Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5623961Z warnings.warn( 2022-12-01T10:50:23.5624324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:50:23.5624834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:50:23.5625497Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5626257Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5627212Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5627783Z warnings.warn( 2022-12-01T10:50:23.5628568Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5629127Z warnings.warn( 2022-12-01T10:50:23.5629488Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:50:23.5629993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:50:23.5630655Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5631706Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5632614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5633001Z dist init r=0, world=2 2022-12-01T10:50:23.5633256Z dist init r=1, world=2 2022-12-01T10:50:23.5633500Z ok (4.312s) 2022-12-01T10:50:23.5633993Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5634765Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28849 2022-12-01T10:50:23.5635304Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28850 2022-12-01T10:50:23.5635917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5636349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5636921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5637391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5637950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5638400Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5638969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5639433Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5639868Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5640370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5641026Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5641715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5642222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5642950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5643943Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5644519Z warnings.warn( 2022-12-01T10:50:23.5645268Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5645817Z warnings.warn( 2022-12-01T10:50:23.5646590Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5647144Z warnings.warn( 2022-12-01T10:50:23.5647901Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5648452Z warnings.warn( 2022-12-01T10:50:23.5649343Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5650602Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5651867Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5652418Z warnings.warn( 2022-12-01T10:50:23.5653210Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5653770Z warnings.warn( 2022-12-01T10:50:23.5654020Z dist init r=0, world=2 2022-12-01T10:50:23.5654254Z dist init r=1, world=2 2022-12-01T10:50:23.5654499Z ok (4.312s) 2022-12-01T10:50:23.5654932Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5655682Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28932 2022-12-01T10:50:23.5656235Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28933 2022-12-01T10:50:23.5656853Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5657305Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5657865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5658344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5658925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5659353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5659930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5660465Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5660945Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5661434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5662104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5662799Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5663317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5663776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5664653Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5665214Z warnings.warn( 2022-12-01T10:50:23.5665979Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5666492Z warnings.warn( 2022-12-01T10:50:23.5667262Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5667817Z warnings.warn( 2022-12-01T10:50:23.5668675Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5669204Z warnings.warn( 2022-12-01T10:50:23.5670097Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5671340Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5672496Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5673057Z warnings.warn( 2022-12-01T10:50:23.5673836Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5674383Z warnings.warn( 2022-12-01T10:50:23.5674634Z dist init r=0, world=2 2022-12-01T10:50:23.5674885Z dist init r=1, world=2 2022-12-01T10:50:23.5675106Z ok (4.311s) 2022-12-01T10:50:23.5675540Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5676307Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29015 2022-12-01T10:50:23.5676859Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29016 2022-12-01T10:50:23.5677521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5677988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5678574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5679027Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5679608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5680061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5680637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5681097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5681556Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5682070Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5682994Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5683682Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5684212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5684685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5685538Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5686218Z warnings.warn( 2022-12-01T10:50:23.5686992Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5687538Z warnings.warn( 2022-12-01T10:50:23.5688289Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5688848Z warnings.warn( 2022-12-01T10:50:23.5689619Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5690179Z warnings.warn( 2022-12-01T10:50:23.5691081Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5692235Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5692802Z warnings.warn( 2022-12-01T10:50:23.5693604Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5694158Z warnings.warn( 2022-12-01T10:50:23.5695119Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5695742Z dist init r=0, world=2 2022-12-01T10:50:23.5696000Z dist init r=1, world=2 2022-12-01T10:50:23.5696243Z ok (4.311s) 2022-12-01T10:50:23.5696658Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_False_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5697423Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29098 2022-12-01T10:50:23.5697970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29099 2022-12-01T10:50:23.5698586Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5699025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5699608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5700085Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5700667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5701101Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5701679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5702152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5702678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5703189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5703861Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5704555Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5705062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5705539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5706417Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5707017Z warnings.warn( 2022-12-01T10:50:23.5707771Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5708324Z warnings.warn( 2022-12-01T10:50:23.5709099Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5709653Z warnings.warn( 2022-12-01T10:50:23.5710392Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5710956Z warnings.warn( 2022-12-01T10:50:23.5711910Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5713090Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5713650Z warnings.warn( 2022-12-01T10:50:23.5714423Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5714982Z warnings.warn( 2022-12-01T10:50:23.5715869Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5716493Z dist init r=0, world=2 2022-12-01T10:50:23.5716730Z dist init r=1, world=2 2022-12-01T10:50:23.5716976Z ok (4.311s) 2022-12-01T10:50:23.5717410Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5718173Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29181 2022-12-01T10:50:23.5718693Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29182 2022-12-01T10:50:23.5719389Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5719842Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5720408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5720885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5721468Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5721911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5722696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5723185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5723639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5724159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5724807Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5725495Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5726027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5726502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5727356Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5727918Z warnings.warn( 2022-12-01T10:50:23.5728799Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5729365Z warnings.warn( 2022-12-01T10:50:23.5730127Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5730688Z warnings.warn( 2022-12-01T10:50:23.5731457Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5732003Z warnings.warn( 2022-12-01T10:50:23.5732883Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5734135Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5735278Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5735835Z warnings.warn( 2022-12-01T10:50:23.5736638Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5737267Z warnings.warn( 2022-12-01T10:50:23.5737524Z dist init r=0, world=2 2022-12-01T10:50:23.5737779Z dist init r=1, world=2 2022-12-01T10:50:23.5738000Z ok (4.311s) 2022-12-01T10:50:23.5738439Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_False_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5739208Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29264 2022-12-01T10:50:23.5739757Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29265 2022-12-01T10:50:23.5740350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5740811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5741393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5741870Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5742434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5742883Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5743456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5743923Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5744360Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5744861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5745592Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5746292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5746819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5747295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5748161Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5748697Z warnings.warn( 2022-12-01T10:50:23.5749464Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5750024Z warnings.warn( 2022-12-01T10:50:23.5750789Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5751329Z warnings.warn( 2022-12-01T10:50:23.5752096Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5752648Z warnings.warn( 2022-12-01T10:50:23.5753538Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5754889Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5756035Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5756601Z warnings.warn( 2022-12-01T10:50:23.5757435Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5757995Z warnings.warn( 2022-12-01T10:50:23.5758226Z dist init r=0, world=2 2022-12-01T10:50:23.5758477Z dist init r=1, world=2 2022-12-01T10:50:23.5758714Z ok (4.311s) 2022-12-01T10:50:23.5759145Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5759887Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29347 2022-12-01T10:50:23.5760431Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29348 2022-12-01T10:50:23.5761041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5761475Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5762061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5762855Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5763479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5763910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5764485Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5764952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5765409Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5765894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5766558Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5767260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5767785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5768238Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5769105Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5769660Z warnings.warn( 2022-12-01T10:50:23.5770414Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5771045Z warnings.warn( 2022-12-01T10:50:23.5771830Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5772385Z warnings.warn( 2022-12-01T10:50:23.5773144Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5773665Z warnings.warn( 2022-12-01T10:50:23.5774557Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5775725Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5776280Z warnings.warn( 2022-12-01T10:50:23.5777046Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5777607Z warnings.warn( 2022-12-01T10:50:23.5778494Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5779112Z dist init r=0, world=2 2022-12-01T10:50:23.5779365Z dist init r=1, world=2 2022-12-01T10:50:23.5779580Z ok (4.211s) 2022-12-01T10:50:23.5780070Z test_shard_full_optim_state_dict_nested_use_multiple_param_groups_True_wrap_alt_True_use_diff_optim_inputs_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5780860Z Tests :meth:`shard_full_optim_state_dict` for a non-FSDP-root model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29430 2022-12-01T10:50:23.5781387Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29431 2022-12-01T10:50:23.5782001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5782448Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5783023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5783482Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5784070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5784521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5785070Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5785538Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5785994Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5786494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5787140Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5787922Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5788451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5788919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5789766Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5790321Z warnings.warn( 2022-12-01T10:50:23.5791074Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5791608Z warnings.warn( 2022-12-01T10:50:23.5792364Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5792929Z warnings.warn( 2022-12-01T10:50:23.5793694Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5794249Z warnings.warn( 2022-12-01T10:50:23.5795122Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5796353Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5796933Z warnings.warn( 2022-12-01T10:50:23.5797720Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5798280Z warnings.warn( 2022-12-01T10:50:23.5799150Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5799773Z dist init r=0, world=2 2022-12-01T10:50:23.5800029Z dist init r=1, world=2 2022-12-01T10:50:23.5800264Z ok (4.311s) 2022-12-01T10:50:23.5800583Z test_shard_full_optim_state_dict_transformer (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5801246Z Tests :meth:`shard_full_optim_state_dict` for an FSDP-root ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29513 2022-12-01T10:50:23.5801773Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29514 2022-12-01T10:50:23.5802598Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5803057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5803638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5804108Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5804668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5805217Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5805798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5806260Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5806732Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5807237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5807897Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5808586Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5809095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5809571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5810443Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5810994Z warnings.warn( 2022-12-01T10:50:23.5811739Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5812274Z warnings.warn( 2022-12-01T10:50:23.5813044Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5813596Z warnings.warn( 2022-12-01T10:50:23.5814428Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5814996Z warnings.warn( 2022-12-01T10:50:23.5815378Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T10:50:23.5815862Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T10:50:23.5816525Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5817210Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T10:50:23.5818259Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5819499Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5820229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-12-01T10:50:23.5820726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-12-01T10:50:23.5821382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5822160Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-12-01T10:50:23.5822550Z dist init r=1, world=2 2022-12-01T10:50:23.5822803Z dist init r=0, world=2 2022-12-01T10:50:23.5823046Z ok (4.812s) 2022-12-01T10:50:23.5823477Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_FULL_STATE_DICT_add_to_fsdp_module_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5824122Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29606 2022-12-01T10:50:23.5824656Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29607 2022-12-01T10:50:23.5825269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5825710Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5826288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5826758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5827337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5827762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5828334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5828798Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5829234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5829738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5830396Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5831151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5831675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5832145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5833021Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5833565Z warnings.warn( 2022-12-01T10:50:23.5834306Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5834854Z warnings.warn( 2022-12-01T10:50:23.5835627Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5836174Z warnings.warn( 2022-12-01T10:50:23.5836915Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5837455Z warnings.warn( 2022-12-01T10:50:23.5838248Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5838432Z warnings.warn( 2022-12-01T10:50:23.5839095Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5839208Z warnings.warn( 2022-12-01T10:50:23.5839303Z dist init r=0, world=2 2022-12-01T10:50:23.5839413Z dist init r=1, world=2 2022-12-01T10:50:23.5839519Z ok (4.111s) 2022-12-01T10:50:23.5839837Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_FULL_STATE_DICT_add_to_fsdp_module_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5840145Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29689 2022-12-01T10:50:23.5840367Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29690 2022-12-01T10:50:23.5840741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5840921Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5841286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5841479Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5841845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5842017Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5842687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5842898Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5843152Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5843506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5843926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5844325Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5844558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5844786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5845412Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5845530Z warnings.warn( 2022-12-01T10:50:23.5846158Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5846269Z warnings.warn( 2022-12-01T10:50:23.5846901Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5847011Z warnings.warn( 2022-12-01T10:50:23.5847618Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5847818Z warnings.warn( 2022-12-01T10:50:23.5848482Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5848594Z warnings.warn( 2022-12-01T10:50:23.5849243Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5849353Z warnings.warn( 2022-12-01T10:50:23.5849467Z dist init r=0, world=2 2022-12-01T10:50:23.5849576Z dist init r=1, world=2 2022-12-01T10:50:23.5849677Z ok (4.011s) 2022-12-01T10:50:23.5849983Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_SHARDED_STATE_DICT_add_to_fsdp_module_False (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5850301Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29772 2022-12-01T10:50:23.5850526Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29773 2022-12-01T10:50:23.5850902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5851079Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5851457Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5851649Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5852017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5852187Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5852544Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5852791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5853055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5853301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5853706Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5854104Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5854336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5854565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5855195Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5855292Z warnings.warn( 2022-12-01T10:50:23.5855913Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5856024Z warnings.warn( 2022-12-01T10:50:23.5856651Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5856760Z warnings.warn( 2022-12-01T10:50:23.5857387Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5857570Z warnings.warn( 2022-12-01T10:50:23.5858324Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5859071Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5859727Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5859847Z warnings.warn( 2022-12-01T10:50:23.5860499Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5860593Z warnings.warn( 2022-12-01T10:50:23.5861330Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5862118Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5862248Z dist init r=0, world=2 2022-12-01T10:50:23.5862360Z dist init r=1, world=2 2022-12-01T10:50:23.5862460Z ok (4.111s) 2022-12-01T10:50:23.5862790Z test_shard_full_optim_state_dict_unmanaged_params_state_dict_type_StateDictType_SHARDED_STATE_DICT_add_to_fsdp_module_True (__main__.TestFSDPOptimState) 2022-12-01T10:50:23.5863102Z Tests :meth:`shard_full_optim_state_dict` when there are unmanaged ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29855 2022-12-01T10:50:23.5863323Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29856 2022-12-01T10:50:23.5863702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5863863Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5864248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5864440Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5864807Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:23.5864982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:23.5865357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:23.5865550Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:23.5865799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:23.5866046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:23.5866503Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5866903Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:23.5867132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:23.5867361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:23.5867983Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5868098Z warnings.warn( 2022-12-01T10:50:23.5868713Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:23.5868828Z warnings.warn( 2022-12-01T10:50:23.5869465Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5869563Z warnings.warn( 2022-12-01T10:50:23.5870196Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:23.5870309Z warnings.warn( 2022-12-01T10:50:23.5871061Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5871866Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5872538Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5872650Z warnings.warn( 2022-12-01T10:50:23.5873287Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3685: UserWarning: The `optim_input` argument is deprecated and will be removed after PyTorch 1.13. You may remove it from your code without changing its functionality. 2022-12-01T10:50:23.5873402Z warnings.warn( 2022-12-01T10:50:23.5874144Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5874875Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:50:23.5874989Z dist init r=0, world=2 2022-12-01T10:50:23.5875098Z dist init r=1, world=2 2022-12-01T10:50:23.5875180Z ok (4.111s) 2022-12-01T10:50:23.5875204Z 2022-12-01T10:50:23.5875548Z ---------------------------------------------------------------------- 2022-12-01T10:50:23.5875670Z Ran 50 tests in 210.845s 2022-12-01T10:50:23.5875690Z 2022-12-01T10:50:23.5875785Z OK 2022-12-01T10:50:23.5875804Z 2022-12-01T10:50:23.5875932Z Generating XML reports... 2022-12-01T10:50:23.5876395Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_optim_state/TEST-TestFSDPOptimState-20221201104652.xml 2022-12-01T10:50:23.5876414Z 2022-12-01T10:50:23.5876844Z ##[endgroup] 2022-12-01T10:50:23.5877338Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_optim_state (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_optim_state_v66rnijf) 2022-12-01T10:50:23.5877358Z 2022-12-01T10:50:23.5877618Z Running distributed/fsdp/test_fsdp_checkpoint ... [2022-12-01 10:50:23.492610] 2022-12-01T10:50:23.5878106Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:50:23.492932] 2022-12-01T10:50:55.1086658Z 2022-12-01T10:50:55.1089404Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_checkpoint 2022-12-01T10:50:55.1090557Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_checkpoint_i61ocax5) 2022-12-01T10:50:55.1091021Z 2022-12-01T10:50:55.1091225Z Running tests... 2022-12-01T10:50:55.1091841Z ---------------------------------------------------------------------- 2022-12-01T10:50:55.1099733Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint 2022-12-01T10:50:55.1100978Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:50:55.1101843Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29973 2022-12-01T10:50:55.1102890Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29974 2022-12-01T10:50:55.1103840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1104577Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1105255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1106185Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1107129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1107759Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1109145Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1109942Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1110916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1112045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1112853Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1114035Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1114994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1115881Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1117516Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1118252Z warnings.warn( 2022-12-01T10:50:55.1119004Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1119553Z warnings.warn( 2022-12-01T10:50:55.1120320Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1120882Z warnings.warn( 2022-12-01T10:50:55.1121622Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1122175Z warnings.warn( 2022-12-01T10:50:55.1122896Z dist init r=0, world=2 2022-12-01T10:50:55.1123227Z dist init r=1, world=2 2022-12-01T10:50:55.1123453Z ok (5.587s) 2022-12-01T10:50:55.1123998Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30056 2022-12-01T10:50:55.1124627Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30057 2022-12-01T10:50:55.1125229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1125685Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1126261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1126734Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1127298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1127856Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1128459Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1128928Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1129363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1129862Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1130523Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1131283Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1131813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1132274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1133148Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1133701Z warnings.warn( 2022-12-01T10:50:55.1134453Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1134979Z warnings.warn( 2022-12-01T10:50:55.1135756Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1136407Z warnings.warn( 2022-12-01T10:50:55.1137182Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1137707Z warnings.warn( 2022-12-01T10:50:55.1137961Z dist init r=1, world=2 2022-12-01T10:50:55.1138215Z dist init r=0, world=2 2022-12-01T10:50:55.1138436Z ok (4.012s) 2022-12-01T10:50:55.1138974Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30139 2022-12-01T10:50:55.1139595Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30140 2022-12-01T10:50:55.1140211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1140646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1141226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1141697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1142276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1142701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1143270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1143732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1144168Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1144672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1145399Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1146112Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1146617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1147090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1147961Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1148520Z warnings.warn( 2022-12-01T10:50:55.1149265Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1149806Z warnings.warn( 2022-12-01T10:50:55.1150578Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1151131Z warnings.warn( 2022-12-01T10:50:55.1151872Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1152421Z warnings.warn( 2022-12-01T10:50:55.1152743Z dist init r=0, world=2 2022-12-01T10:50:55.1152996Z dist init r=1, world=2 2022-12-01T10:50:55.1153219Z ok (4.014s) 2022-12-01T10:50:55.1153758Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30222 2022-12-01T10:50:55.1154388Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30223 2022-12-01T10:50:55.1154981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1155431Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1156010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1156483Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1157043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1157491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1158068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1158516Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1158973Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1159467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1160124Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1160794Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1161321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1161797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1163329Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1163901Z warnings.warn( 2022-12-01T10:50:55.1164668Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1165214Z warnings.warn( 2022-12-01T10:50:55.1165987Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1166524Z warnings.warn( 2022-12-01T10:50:55.1167285Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1167839Z warnings.warn( 2022-12-01T10:50:55.1168089Z dist init r=1, world=2 2022-12-01T10:50:55.1168322Z dist init r=0, world=2 2022-12-01T10:50:55.1168560Z ok (4.112s) 2022-12-01T10:50:55.1169101Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30305 2022-12-01T10:50:55.1169710Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30306 2022-12-01T10:50:55.1170325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1170874Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1171460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1171916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1172497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1172943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1173496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1173964Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1174418Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1174922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1175565Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1176254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1176777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1177247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1178098Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1178651Z warnings.warn( 2022-12-01T10:50:55.1179411Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1180031Z warnings.warn( 2022-12-01T10:50:55.1180802Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1181355Z warnings.warn( 2022-12-01T10:50:55.1182114Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1182657Z warnings.warn( 2022-12-01T10:50:55.1182887Z dist init r=1, world=2 2022-12-01T10:50:55.1183143Z dist init r=0, world=2 2022-12-01T10:50:55.1183388Z ok (4.012s) 2022-12-01T10:50:55.1183911Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30388 2022-12-01T10:50:55.1184541Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30389 2022-12-01T10:50:55.1185154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1185608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1186167Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1186634Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1187215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1187713Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1188298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1188764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1189220Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1189705Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1190366Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1191059Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1191581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1192037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1192903Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1193453Z warnings.warn( 2022-12-01T10:50:55.1194190Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1194732Z warnings.warn( 2022-12-01T10:50:55.1195500Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1196061Z warnings.warn( 2022-12-01T10:50:55.1196857Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1197417Z warnings.warn( 2022-12-01T10:50:55.1197665Z dist init r=0, world=2 2022-12-01T10:50:55.1197920Z dist init r=1, world=2 2022-12-01T10:50:55.1198141Z ok (4.011s) 2022-12-01T10:50:55.1198683Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30471 2022-12-01T10:50:55.1199309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30472 2022-12-01T10:50:55.1199905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1200364Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1200941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1201414Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1201975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:50:55.1202924Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:50:55.1203552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:50:55.1204020Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:50:55.1204487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:50:55.1204990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:50:55.1205779Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1206454Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:50:55.1206981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:50:55.1207454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:50:55.1208330Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1208862Z warnings.warn( 2022-12-01T10:50:55.1209616Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:50:55.1210164Z warnings.warn( 2022-12-01T10:50:55.1210938Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1211473Z warnings.warn( 2022-12-01T10:50:55.1212230Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:50:55.1212779Z warnings.warn( 2022-12-01T10:50:55.1213031Z dist init r=0, world=2 2022-12-01T10:50:55.1213269Z dist init r=1, world=2 2022-12-01T10:50:55.1213508Z ok (4.012s) 2022-12-01T10:50:55.1214746Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True (__main__.TestFSDPCheckpoint) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/71349 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.002s) 2022-12-01T10:50:55.1215444Z 2022-12-01T10:50:55.1215722Z ---------------------------------------------------------------------- 2022-12-01T10:50:55.1216035Z Ran 8 tests in 29.762s 2022-12-01T10:50:55.1216198Z 2022-12-01T10:50:55.1216305Z OK (skipped=1) 2022-12-01T10:50:55.1216457Z 2022-12-01T10:50:55.1216581Z Generating XML reports... 2022-12-01T10:50:55.1217181Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpoint-20221201105025.xml 2022-12-01T10:50:55.1217544Z 2022-12-01T10:50:55.1217894Z ##[endgroup] 2022-12-01T10:50:55.1218525Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_checkpoint_i61ocax5) 2022-12-01T10:50:55.1218897Z 2022-12-01T10:50:55.1219163Z Running distributed/fsdp/test_fsdp_misc ... [2022-12-01 10:50:55.108810] 2022-12-01T10:50:55.1219817Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:50:55.109070] 2022-12-01T10:51:56.5442950Z 2022-12-01T10:51:56.5443911Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_misc 2022-12-01T10:51:56.5445020Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_misc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_misc_rod09kex) 2022-12-01T10:51:56.5445393Z 2022-12-01T10:51:56.5448449Z Running tests... 2022-12-01T10:51:56.5449099Z ---------------------------------------------------------------------- 2022-12-01T10:51:56.5449676Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_misc 2022-12-01T10:51:56.5450619Z test_cpu_init_with_sync_module_states (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5451079Z Tests that passing ``sync_module_states=True`` raises an error for ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:51:56.5451571Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30589 2022-12-01T10:51:56.5452018Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30590 2022-12-01T10:51:56.5452644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5453106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5453692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5454167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5454752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5455210Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5455798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5456272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5456713Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5457216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5457887Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5458566Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5459177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5459762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5461026Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:51:56.5461816Z warnings.warn( 2022-12-01T10:51:56.5462944Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:51:56.5463698Z warnings.warn( 2022-12-01T10:51:56.5463939Z dist init r=1, world=2 2022-12-01T10:51:56.5464187Z dist init r=0, world=2 2022-12-01T10:51:56.5464425Z ok (5.002s) 2022-12-01T10:51:56.5464691Z test_device_id_auto_wrap (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5465185Z Tests that ``auto_wrap_policy`` propagates ``device_id`` to all ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30668 2022-12-01T10:51:56.5465711Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30669 2022-12-01T10:51:56.5466307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5466758Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5467423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5467899Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5468462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5468910Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5469480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5469940Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5470379Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5470877Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5471536Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5472213Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5472741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5473224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5473578Z dist init r=1, world=2 2022-12-01T10:51:56.5473812Z dist init r=0, world=2 2022-12-01T10:51:56.5474049Z ok (3.410s) 2022-12-01T10:51:56.5474350Z test_fsdp_cpu_init_stays_on_cpu (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5474841Z Tests that passing a CPU module to FSDP preserves that the wrapped ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30747 2022-12-01T10:51:56.5475377Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30748 2022-12-01T10:51:56.5475990Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5476509Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5477088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5477557Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5478137Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5478563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5479131Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5479594Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5480049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5480531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5481193Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5481886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5482818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5483351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5484231Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5484916Z warnings.warn( 2022-12-01T10:51:56.5485687Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5486219Z warnings.warn( 2022-12-01T10:51:56.5486989Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5487538Z warnings.warn( 2022-12-01T10:51:56.5488296Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5488811Z warnings.warn( 2022-12-01T10:51:56.5489065Z dist init r=1, world=2 2022-12-01T10:51:56.5489316Z dist init r=0, world=2 2022-12-01T10:51:56.5489535Z ok (3.911s) 2022-12-01T10:51:56.5489840Z test_fsdp_device_id_cpu_offload (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5490333Z Ensures that even if device_id is specified but we have ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30830 2022-12-01T10:51:56.5490839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30831 2022-12-01T10:51:56.5491453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5491907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5492467Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5492893Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5493462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5493936Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5494611Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5495077Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5495536Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5496036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5496682Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5497381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5498052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5498526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5498868Z dist init r=0, world=2 2022-12-01T10:51:56.5499123Z dist init r=1, world=2 2022-12-01T10:51:56.5499366Z ok (3.410s) 2022-12-01T10:51:56.5499655Z test_fsdp_device_id_use_index_False (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5500174Z Tests the FSDP ``device_id`` argument: ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30909 2022-12-01T10:51:56.5500672Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30910 2022-12-01T10:51:56.5501288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5501724Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5502297Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5502853Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5503420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5503870Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5504437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5504905Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5505341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5505837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5506496Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5507194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5507702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5508177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5509176Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:51:56.5510435Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:51:56.5511028Z dist init r=1, world=2 2022-12-01T10:51:56.5511278Z dist init r=0, world=2 2022-12-01T10:51:56.5511576Z ok (3.310s) 2022-12-01T10:51:56.5511885Z test_fsdp_device_id_use_index_True (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5512355Z Tests the FSDP ``device_id`` argument: ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30988 2022-12-01T10:51:56.5512853Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30989 2022-12-01T10:51:56.5513469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5513904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5514478Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5514952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5515527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5515955Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5516523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5516988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5517426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5517925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5518591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5519285Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5519866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5520347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5521349Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:51:56.5523067Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:51:56.5523713Z dist init r=1, world=2 2022-12-01T10:51:56.5523948Z dist init r=0, world=2 2022-12-01T10:51:56.5524189Z ok (3.410s) 2022-12-01T10:51:56.5524687Z test_fsdp_module_no_compute_grad_use_second_layer_False_sharding_strategy_None (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31067 2022-12-01T10:51:56.5525253Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31068 2022-12-01T10:51:56.5525880Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5526333Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5526909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5527367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5527943Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5528394Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5529044Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5529534Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5529993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5530494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5531141Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5531832Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5532352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5532827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5533683Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5534241Z warnings.warn( 2022-12-01T10:51:56.5534989Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5535536Z warnings.warn( 2022-12-01T10:51:56.5536285Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5536998Z warnings.warn( 2022-12-01T10:51:56.5537831Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5538426Z warnings.warn( 2022-12-01T10:51:56.5538670Z dist init r=0, world=2 2022-12-01T10:51:56.5538931Z dist init r=1, world=2 2022-12-01T10:51:56.5539185Z ok (3.911s) 2022-12-01T10:51:56.5539729Z test_fsdp_module_no_compute_grad_use_second_layer_False_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31150 2022-12-01T10:51:56.5540392Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31151 2022-12-01T10:51:56.5541035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5541521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5542116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5542618Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5543236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5543692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5544301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5544799Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5545280Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5545796Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5546553Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5547314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5547875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5548369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5549294Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5549878Z warnings.warn( 2022-12-01T10:51:56.5550666Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5551257Z warnings.warn( 2022-12-01T10:51:56.5551522Z dist init r=0, world=2 2022-12-01T10:51:56.5551786Z dist init r=1, world=2 2022-12-01T10:51:56.5552023Z ok (3.911s) 2022-12-01T10:51:56.5552543Z test_fsdp_module_no_compute_grad_use_second_layer_True_sharding_strategy_None (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31233 2022-12-01T10:51:56.5553164Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31234 2022-12-01T10:51:56.5553799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5554285Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5554900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5555485Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5556096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5556584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5557195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5557695Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5558156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5558688Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5559382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5560104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5560676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5561189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5562117Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5563137Z warnings.warn( 2022-12-01T10:51:56.5563927Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5564490Z warnings.warn( 2022-12-01T10:51:56.5565371Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5565930Z warnings.warn( 2022-12-01T10:51:56.5566697Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5567248Z warnings.warn( 2022-12-01T10:51:56.5567499Z dist init r=0, world=2 2022-12-01T10:51:56.5567731Z dist init r=1, world=2 2022-12-01T10:51:56.5567967Z ok (3.911s) 2022-12-01T10:51:56.5568485Z test_fsdp_module_no_compute_grad_use_second_layer_True_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31316 2022-12-01T10:51:56.5569081Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31317 2022-12-01T10:51:56.5569700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5570156Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5570732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5571184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5571760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5572202Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5572778Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5573327Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5573784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5574287Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5574940Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5575641Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5576164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5576638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5577495Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5578051Z warnings.warn( 2022-12-01T10:51:56.5578817Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5579364Z warnings.warn( 2022-12-01T10:51:56.5579596Z dist init r=0, world=2 2022-12-01T10:51:56.5579847Z dist init r=1, world=2 2022-12-01T10:51:56.5580083Z ok (4.012s) 2022-12-01T10:51:56.5580476Z test_fsdp_namedtuple (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31399 2022-12-01T10:51:56.5580985Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31400 2022-12-01T10:51:56.5581597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5582055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5582668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5583132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5583708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5584161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5584745Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5585210Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5585665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5586151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5586810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5587501Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5588024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5588474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5589330Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5589877Z warnings.warn( 2022-12-01T10:51:56.5590755Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5591302Z warnings.warn( 2022-12-01T10:51:56.5591555Z dist init r=0, world=2 2022-12-01T10:51:56.5591813Z dist init r=1, world=2 2022-12-01T10:51:56.5592034Z ok (3.410s) 2022-12-01T10:51:56.5592462Z test_fsdp_not_all_outputs_used_in_loss (__main__.TestFSDPMisc) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31478 2022-12-01T10:51:56.5592991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31479 2022-12-01T10:51:56.5593579Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5594031Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5594591Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5595041Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5595601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5596070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5596657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5597117Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5597557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5598054Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5598710Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5599387Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5600017Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5600510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5601385Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5601916Z warnings.warn( 2022-12-01T10:51:56.5603191Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5603772Z warnings.warn( 2022-12-01T10:51:56.5604704Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:51:56.5605357Z warnings.warn(msg, FutureWarning) 2022-12-01T10:51:56.5606280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:51:56.5606945Z warnings.warn(msg, FutureWarning) 2022-12-01T10:51:56.5607746Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5608422Z warnings.warn( 2022-12-01T10:51:56.5609174Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:51:56.5609730Z warnings.warn( 2022-12-01T10:51:56.5609983Z dist init r=1, world=2 2022-12-01T10:51:56.5610214Z dist init r=0, world=2 2022-12-01T10:51:56.5610451Z ok (4.113s) 2022-12-01T10:51:56.5610755Z test_fsdp_same_model_across_ranks (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5611245Z FSDP broadcasts model from rank 0 to ensure it starts off with the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31561 2022-12-01T10:51:56.5611785Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31562 2022-12-01T10:51:56.5612400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5612850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5613407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5613881Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5614460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5614901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5615451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5615911Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5616363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5616845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5617586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5618310Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5618834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5619288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5620152Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5620708Z warnings.warn( 2022-12-01T10:51:56.5621471Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:51:56.5621996Z warnings.warn( 2022-12-01T10:51:56.5622246Z dist init r=0, world=2 2022-12-01T10:51:56.5622502Z dist init r=1, world=2 2022-12-01T10:51:56.5622723Z ok (3.510s) 2022-12-01T10:51:56.5623032Z test_module_device_mismatches_device_id (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5623541Z Tests that specifying a ``device_id`` argument to FSDP for a GPU ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31640 2022-12-01T10:51:56.5624072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31641 2022-12-01T10:51:56.5624669Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5625202Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5625785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5626238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5626819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5627268Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5627842Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5628294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5628746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5629239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5629896Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5630573Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5631101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5631575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5631913Z dist init r=0, world=2 2022-12-01T10:51:56.5632166Z dist init r=1, world=2 2022-12-01T10:51:56.5632403Z ok (3.409s) 2022-12-01T10:51:56.5632687Z test_multi_device_not_supported (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5633326Z Tests that wrapping a multi-device module (i.e. with submodules on ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31719 2022-12-01T10:51:56.5633875Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31720 2022-12-01T10:51:56.5634547Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5634992Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5635571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5636046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5636625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5637050Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5637615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5638075Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5638517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5639014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5639671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5640356Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5640857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5641327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5641680Z dist init r=0, world=2 2022-12-01T10:51:56.5641912Z dist init r=1, world=2 2022-12-01T10:51:56.5642150Z ok (3.510s) 2022-12-01T10:51:56.5642935Z test_no_params (__main__.TestFSDPMisc) 2022-12-01T10:51:56.5643409Z Test that device_id and cpu init work if module has no params ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31798 2022-12-01T10:51:56.5643943Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31799 2022-12-01T10:51:56.5644565Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5645020Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5645581Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5646046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5646625Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:51:56.5647066Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:51:56.5647626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:51:56.5648094Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:51:56.5648547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:51:56.5649024Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:51:56.5649684Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5650374Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:51:56.5650894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:51:56.5651342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:51:56.5651692Z dist init r=0, world=2 2022-12-01T10:51:56.5651946Z dist init r=1, world=2 2022-12-01T10:51:56.5652168Z ok (3.418s) 2022-12-01T10:51:56.5652316Z 2022-12-01T10:51:56.5652682Z ---------------------------------------------------------------------- 2022-12-01T10:51:56.5653036Z Ran 16 tests in 59.571s 2022-12-01T10:51:56.5653201Z 2022-12-01T10:51:56.5653297Z OK 2022-12-01T10:51:56.5653412Z 2022-12-01T10:51:56.5653539Z Generating XML reports... 2022-12-01T10:51:56.5654122Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20221201105056.xml 2022-12-01T10:51:56.5654461Z 2022-12-01T10:51:56.5654832Z ##[endgroup] 2022-12-01T10:51:56.5655407Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_misc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_misc_rod09kex) 2022-12-01T10:51:56.5655760Z 2022-12-01T10:51:56.5656035Z Running distributed/fsdp/test_fsdp_grad_acc ... [2022-12-01 10:51:56.544689] 2022-12-01T10:51:56.5656732Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_grad_acc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:51:56.544955] 2022-12-01T10:52:47.9359215Z 2022-12-01T10:52:47.9359955Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_grad_acc 2022-12-01T10:52:47.9361008Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_grad_acc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_grad_acc_0kxqsm6a) 2022-12-01T10:52:47.9361388Z 2022-12-01T10:52:47.9364077Z Running tests... 2022-12-01T10:52:47.9365085Z ---------------------------------------------------------------------- 2022-12-01T10:52:47.9365702Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc 2022-12-01T10:52:47.9366522Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9367587Z Tests gradient accumulation. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:52:47.9368049Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31912 2022-12-01T10:52:47.9368512Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31913 2022-12-01T10:52:47.9369165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9369623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9370196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9370645Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9371230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9371684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9372276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9372731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9373180Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9373682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9374319Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9375019Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9375545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9376055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9377421Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9378200Z warnings.warn( 2022-12-01T10:52:47.9379329Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9380080Z warnings.warn( 2022-12-01T10:52:47.9380854Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9381393Z warnings.warn( 2022-12-01T10:52:47.9382135Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9382670Z warnings.warn( 2022-12-01T10:52:47.9383451Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9384091Z warnings.warn( 2022-12-01T10:52:47.9384858Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9385423Z warnings.warn( 2022-12-01T10:52:47.9386621Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:586: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:52:47.9387456Z (rank, world_num_valid_indices[rank]) 2022-12-01T10:52:47.9388526Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9389774Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9390374Z dist init r=0, world=2 2022-12-01T10:52:47.9390624Z dist init r=1, world=2 2022-12-01T10:52:47.9390860Z ok (6.293s) 2022-12-01T10:52:47.9391377Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9392049Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31995 2022-12-01T10:52:47.9392606Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31996 2022-12-01T10:52:47.9393238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9393673Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9394244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9394712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9395287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9395712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9396329Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9396798Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9397238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9397736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9398396Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9399087Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9399593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9400062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9401379Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9402136Z warnings.warn( 2022-12-01T10:52:47.9403678Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9404417Z warnings.warn( 2022-12-01T10:52:47.9405183Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9405723Z warnings.warn( 2022-12-01T10:52:47.9406468Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9406993Z warnings.warn( 2022-12-01T10:52:47.9407880Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9409231Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9410496Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9411728Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9412324Z dist init r=1, world=2 2022-12-01T10:52:47.9412559Z dist init r=0, world=2 2022-12-01T10:52:47.9412796Z ok (4.512s) 2022-12-01T10:52:47.9413337Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestGradAcc) 2022-12-01T10:52:47.9414011Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32078 2022-12-01T10:52:47.9414487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32079 2022-12-01T10:52:47.9415096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9415552Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9416102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9416567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9417253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9417700Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9418254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9418715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9419169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9419648Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9420308Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9421001Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9421529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9421987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9423200Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9423955Z warnings.warn( 2022-12-01T10:52:47.9425123Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9425891Z warnings.warn( 2022-12-01T10:52:47.9426639Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9427183Z warnings.warn( 2022-12-01T10:52:47.9427928Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9428464Z warnings.warn( 2022-12-01T10:52:47.9429212Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9429762Z warnings.warn( 2022-12-01T10:52:47.9430516Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9431062Z warnings.warn( 2022-12-01T10:52:47.9432236Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:586: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:52:47.9433156Z (rank, world_num_valid_indices[rank]) 2022-12-01T10:52:47.9434085Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9435320Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9435930Z dist init r=0, world=2 2022-12-01T10:52:47.9436164Z dist init r=1, world=2 2022-12-01T10:52:47.9436403Z ok (4.612s) 2022-12-01T10:52:47.9436934Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9437615Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32161 2022-12-01T10:52:47.9438094Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32162 2022-12-01T10:52:47.9438702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9439156Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9439727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9440178Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9440751Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9441198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9441814Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9442298Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9443155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9443655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9444306Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9444994Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9445513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9445992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9446327Z dist init r=1, world=2 2022-12-01T10:52:47.9446579Z dist init r=0, world=2 2022-12-01T10:52:47.9446820Z ok (3.410s) 2022-12-01T10:52:47.9447333Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9447996Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32240 2022-12-01T10:52:47.9448485Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32241 2022-12-01T10:52:47.9449093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9449525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9450219Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9450689Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9451247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9451689Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9452258Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9452722Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9453153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9453645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9454306Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9454997Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9455502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9455969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9456322Z dist init r=1, world=2 2022-12-01T10:52:47.9456555Z dist init r=0, world=2 2022-12-01T10:52:47.9456799Z ok (3.410s) 2022-12-01T10:52:47.9457336Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestGradAcc) 2022-12-01T10:52:47.9458008Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32319 2022-12-01T10:52:47.9458485Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32320 2022-12-01T10:52:47.9459180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9459647Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9460209Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9460675Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9461254Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9461695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9462248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9462712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9463163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9463658Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9464301Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9464988Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9465504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9465954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9466305Z dist init r=1, world=2 2022-12-01T10:52:47.9466556Z dist init r=0, world=2 2022-12-01T10:52:47.9466894Z ok (3.410s) 2022-12-01T10:52:47.9467410Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9468080Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32398 2022-12-01T10:52:47.9468567Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32399 2022-12-01T10:52:47.9469169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9469616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9470187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9470652Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9471227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9471669Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9472239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9472697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9473127Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9473619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9474267Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9474931Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9475458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9475993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9477239Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9477997Z warnings.warn( 2022-12-01T10:52:47.9479096Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9479846Z warnings.warn( 2022-12-01T10:52:47.9480603Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9481142Z warnings.warn( 2022-12-01T10:52:47.9481873Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9482750Z warnings.warn( 2022-12-01T10:52:47.9483532Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9484197Z warnings.warn( 2022-12-01T10:52:47.9484946Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9485488Z warnings.warn( 2022-12-01T10:52:47.9486673Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:586: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:52:47.9487511Z (rank, world_num_valid_indices[rank]) 2022-12-01T10:52:47.9488431Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9489682Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9490265Z dist init r=0, world=2 2022-12-01T10:52:47.9490512Z dist init r=1, world=2 2022-12-01T10:52:47.9490749Z ok (4.612s) 2022-12-01T10:52:47.9491263Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9492018Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32481 2022-12-01T10:52:47.9492535Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32482 2022-12-01T10:52:47.9493151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9493584Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9494149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9494593Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9495162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9495620Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9496210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9496723Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9497157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9497652Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9498313Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9499005Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9499509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9500064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9501300Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9502048Z warnings.warn( 2022-12-01T10:52:47.9503164Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9503895Z warnings.warn( 2022-12-01T10:52:47.9504659Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9505204Z warnings.warn( 2022-12-01T10:52:47.9505952Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9506482Z warnings.warn( 2022-12-01T10:52:47.9507366Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9508683Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9509929Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9511152Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9511768Z dist init r=1, world=2 2022-12-01T10:52:47.9511999Z dist init r=0, world=2 2022-12-01T10:52:47.9512237Z ok (4.514s) 2022-12-01T10:52:47.9512775Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestGradAcc) 2022-12-01T10:52:47.9513432Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32564 2022-12-01T10:52:47.9513921Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32565 2022-12-01T10:52:47.9514532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9514982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9515538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9516094Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9516678Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9517127Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9517676Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9518138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9518585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9519061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9519722Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9520418Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9520942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9521401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9522930Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9523705Z warnings.warn( 2022-12-01T10:52:47.9524916Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:52:47.9525682Z warnings.warn( 2022-12-01T10:52:47.9526431Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9526968Z warnings.warn( 2022-12-01T10:52:47.9527713Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:52:47.9528258Z warnings.warn( 2022-12-01T10:52:47.9529008Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9529561Z warnings.warn( 2022-12-01T10:52:47.9530322Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:52:47.9530871Z warnings.warn( 2022-12-01T10:52:47.9532043Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:586: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:327.) 2022-12-01T10:52:47.9532983Z (rank, world_num_valid_indices[rank]) 2022-12-01T10:52:47.9533908Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9535154Z [W python_variable.cpp:327] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function decref) 2022-12-01T10:52:47.9535766Z dist init r=1, world=2 2022-12-01T10:52:47.9535997Z dist init r=0, world=2 2022-12-01T10:52:47.9536240Z ok (4.512s) 2022-12-01T10:52:47.9536769Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9537434Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32647 2022-12-01T10:52:47.9537910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32648 2022-12-01T10:52:47.9538518Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9538962Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9539535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9539988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9540572Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9541094Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9541671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9542134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9542583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9543074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9543717Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9544406Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9544929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9545398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9545735Z dist init r=1, world=2 2022-12-01T10:52:47.9545984Z dist init r=0, world=2 2022-12-01T10:52:47.9546220Z ok (3.410s) 2022-12-01T10:52:47.9546728Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_NO_SHARD (__main__.TestGradAcc) 2022-12-01T10:52:47.9547388Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32726 2022-12-01T10:52:47.9547877Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32727 2022-12-01T10:52:47.9548482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9548998Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9549580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9550046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9550607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9551051Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9551621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9552083Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9552516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9553016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9553677Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9554369Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9554869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9555337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9555690Z dist init r=1, world=2 2022-12-01T10:52:47.9555920Z dist init r=0, world=2 2022-12-01T10:52:47.9556159Z ok (3.410s) 2022-12-01T10:52:47.9556690Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestGradAcc) 2022-12-01T10:52:47.9557360Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32805 2022-12-01T10:52:47.9557900Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32806 2022-12-01T10:52:47.9558532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9558982Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9559543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9560008Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9560580Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:52:47.9561023Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:52:47.9561584Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:52:47.9562052Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:52:47.9562838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:52:47.9563333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:52:47.9563983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9564673Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:52:47.9565196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:52:47.9565646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:52:47.9566112Z dist init r=0, world=2 2022-12-01T10:52:47.9566363Z dist init r=1, world=2 2022-12-01T10:52:47.9566588Z ok (3.410s) 2022-12-01T10:52:47.9566738Z 2022-12-01T10:52:47.9567017Z ---------------------------------------------------------------------- 2022-12-01T10:52:47.9567344Z Ran 12 tests in 49.517s 2022-12-01T10:52:47.9567507Z 2022-12-01T10:52:47.9567600Z OK 2022-12-01T10:52:47.9567714Z 2022-12-01T10:52:47.9567838Z Generating XML reports... 2022-12-01T10:52:47.9568419Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc/TEST-TestGradAcc-20221201105158.xml 2022-12-01T10:52:47.9568759Z 2022-12-01T10:52:47.9569170Z ##[endgroup] 2022-12-01T10:52:47.9569765Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_grad_acc (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_grad_acc_0kxqsm6a) 2022-12-01T10:52:47.9570121Z 2022-12-01T10:52:47.9570411Z Running distributed/fsdp/test_fsdp_freezing_weights ... [2022-12-01 10:52:47.936157] 2022-12-01T10:52:47.9571131Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_freezing_weights.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:52:47.936454] 2022-12-01T10:53:31.0866608Z 2022-12-01T10:53:31.0867388Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_freezing_weights 2022-12-01T10:53:31.0868387Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_freezing_weights (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_freezing_weights_b_vbcw1m) 2022-12-01T10:53:31.0868783Z 2022-12-01T10:53:31.0870752Z Running tests... 2022-12-01T10:53:31.0871827Z ---------------------------------------------------------------------- 2022-12-01T10:53:31.0872682Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights 2022-12-01T10:53:31.0873544Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:53:31.0874172Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32919 2022-12-01T10:53:31.0874863Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32920 2022-12-01T10:53:31.0875536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0875992Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0876574Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0877027Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0877604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0878071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0878659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0879112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0879579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0880105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0880774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0881442Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0881967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0882885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0883524Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0884001Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0884902Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0885555Z warnings.warn( 2022-12-01T10:53:31.0886313Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0886862Z warnings.warn( 2022-12-01T10:53:31.0887609Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0888179Z warnings.warn( 2022-12-01T10:53:31.0888940Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0889501Z warnings.warn( 2022-12-01T10:53:31.0889734Z dist init r=0, world=2 2022-12-01T10:53:31.0889980Z dist init r=1, world=2 2022-12-01T10:53:31.0890226Z ok (6.609s) 2022-12-01T10:53:31.0890776Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33002 2022-12-01T10:53:31.0891432Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33003 2022-12-01T10:53:31.0892080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0892643Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0893272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0893764Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0894351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0894797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0895358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0895843Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0896409Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0896906Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0897565Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0898233Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0898753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0899232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0899704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0900175Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0901503Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.0902271Z warnings.warn( 2022-12-01T10:53:31.0903380Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.0904130Z warnings.warn( 2022-12-01T10:53:31.0904876Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0905414Z warnings.warn( 2022-12-01T10:53:31.0906160Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0906702Z warnings.warn( 2022-12-01T10:53:31.0907444Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0907993Z warnings.warn( 2022-12-01T10:53:31.0908745Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0909293Z warnings.warn( 2022-12-01T10:53:31.0909582Z dist init r=1, world=2 2022-12-01T10:53:31.0909846Z dist init r=0, world=2 2022-12-01T10:53:31.0910084Z ok (4.912s) 2022-12-01T10:53:31.0910634Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33085 2022-12-01T10:53:31.0911283Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33086 2022-12-01T10:53:31.0911895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0912348Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0912909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0913374Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0913948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0914388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0914939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0915400Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0915858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0916337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0916993Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0917764Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0918283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0918734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0919210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0919693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0920551Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0921104Z warnings.warn( 2022-12-01T10:53:31.0921865Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0922832Z warnings.warn( 2022-12-01T10:53:31.0923602Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0924155Z warnings.warn( 2022-12-01T10:53:31.0924909Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0925455Z warnings.warn( 2022-12-01T10:53:31.0925680Z dist init r=0, world=2 2022-12-01T10:53:31.0925934Z dist init r=1, world=2 2022-12-01T10:53:31.0926171Z ok (4.912s) 2022-12-01T10:53:31.0926875Z test_freezing_weights_with_nested_trunk_False_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33168 2022-12-01T10:53:31.0927553Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33169 2022-12-01T10:53:31.0928170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0928619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0929172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0929643Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0930218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0930665Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0931220Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0931682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0932135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0932613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0933272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0933957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0934472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0935024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0935497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0935980Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0937214Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.0937969Z warnings.warn( 2022-12-01T10:53:31.0939062Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.0939818Z warnings.warn( 2022-12-01T10:53:31.0940572Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0941109Z warnings.warn( 2022-12-01T10:53:31.0941835Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0942373Z warnings.warn( 2022-12-01T10:53:31.0943200Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0943766Z warnings.warn( 2022-12-01T10:53:31.0944507Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0945043Z warnings.warn( 2022-12-01T10:53:31.0945289Z dist init r=1, world=2 2022-12-01T10:53:31.0945536Z dist init r=0, world=2 2022-12-01T10:53:31.0945757Z ok (4.913s) 2022-12-01T10:53:31.0946312Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33251 2022-12-01T10:53:31.0946961Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33252 2022-12-01T10:53:31.0947556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0948006Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0948578Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0949043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0949601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0950042Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0950608Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0951134Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0951582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0952080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0952738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0953415Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0953935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0954403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0954881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0955353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0956230Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0956781Z warnings.warn( 2022-12-01T10:53:31.0957541Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0958065Z warnings.warn( 2022-12-01T10:53:31.0958826Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0959377Z warnings.warn( 2022-12-01T10:53:31.0960193Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0960737Z warnings.warn( 2022-12-01T10:53:31.0960984Z dist init r=0, world=2 2022-12-01T10:53:31.0961231Z dist init r=1, world=2 2022-12-01T10:53:31.0961449Z ok (5.013s) 2022-12-01T10:53:31.0962004Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_GradToNone_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33334 2022-12-01T10:53:31.0963032Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33335 2022-12-01T10:53:31.0963655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0964092Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0964665Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0965135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0965691Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0966129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0966699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0967158Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0967591Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0968089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0968853Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0969545Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0970051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0970514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0970991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0971457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0972694Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.0973453Z warnings.warn( 2022-12-01T10:53:31.0974560Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.0975301Z warnings.warn( 2022-12-01T10:53:31.0976040Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0976586Z warnings.warn( 2022-12-01T10:53:31.0977437Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0977998Z warnings.warn( 2022-12-01T10:53:31.0978751Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0979294Z warnings.warn( 2022-12-01T10:53:31.0980045Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0980596Z warnings.warn( 2022-12-01T10:53:31.0980822Z dist init r=0, world=2 2022-12-01T10:53:31.0981069Z dist init r=1, world=2 2022-12-01T10:53:31.0981307Z ok (5.113s) 2022-12-01T10:53:31.0981856Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_False (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33447 2022-12-01T10:53:31.0982509Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33448 2022-12-01T10:53:31.0983117Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0983568Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0984110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0984554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0985206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0985676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0986248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.0986712Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.0987163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.0987641Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.0988302Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0988989Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.0989514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.0989969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.0990439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0990925Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.0991804Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0992332Z warnings.warn( 2022-12-01T10:53:31.0993077Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.0993664Z warnings.warn( 2022-12-01T10:53:31.0994490Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0995040Z warnings.warn( 2022-12-01T10:53:31.0995798Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.0996343Z warnings.warn( 2022-12-01T10:53:31.0996587Z dist init r=0, world=2 2022-12-01T10:53:31.0996818Z dist init r=1, world=2 2022-12-01T10:53:31.0997055Z ok (4.913s) 2022-12-01T10:53:31.0997611Z test_freezing_weights_with_nested_trunk_True_freezing_method_FreezingMethod_RequiresGrad_freeze_after_wrap_fsdp_True (__main__.TestFreezingWeights) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33530 2022-12-01T10:53:31.0998253Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33531 2022-12-01T10:53:31.0998862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.0999308Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.0999879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.1000330Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.1000902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:53:31.1001345Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:53:31.1001894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:53:31.1002850Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:53:31.1003308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:53:31.1003802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:53:31.1004453Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.1005137Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:53:31.1005658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:53:31.1006128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:53:31.1006598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.1007079Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:53:31.1008312Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.1009063Z warnings.warn( 2022-12-01T10:53:31.1010151Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:53:31.1010898Z warnings.warn( 2022-12-01T10:53:31.1011740Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.1012302Z warnings.warn( 2022-12-01T10:53:31.1013056Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:53:31.1013580Z warnings.warn( 2022-12-01T10:53:31.1014345Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.1014899Z warnings.warn( 2022-12-01T10:53:31.1015655Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:53:31.1016198Z warnings.warn( 2022-12-01T10:53:31.1016444Z dist init r=1, world=2 2022-12-01T10:53:31.1016690Z dist init r=0, world=2 2022-12-01T10:53:31.1016910Z ok (4.913s) 2022-12-01T10:53:31.1017057Z 2022-12-01T10:53:31.1017325Z ---------------------------------------------------------------------- 2022-12-01T10:53:31.1017654Z Ran 8 tests in 41.299s 2022-12-01T10:53:31.1017817Z 2022-12-01T10:53:31.1017909Z OK 2022-12-01T10:53:31.1018025Z 2022-12-01T10:53:31.1018147Z Generating XML reports... 2022-12-01T10:53:31.1018773Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights/TEST-TestFreezingWeights-20221201105249.xml 2022-12-01T10:53:31.1019240Z 2022-12-01T10:53:31.1019654Z ##[endgroup] 2022-12-01T10:53:31.1020294Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_freezing_weights (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_freezing_weights_b_vbcw1m) 2022-12-01T10:53:31.1020680Z 2022-12-01T10:53:31.1020956Z Running distributed/fsdp/test_fsdp_exec_order ... [2022-12-01 10:53:31.086764] 2022-12-01T10:53:31.1021644Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_exec_order.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:53:31.087028] 2022-12-01T10:54:06.1522630Z 2022-12-01T10:54:06.1523528Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_exec_order 2022-12-01T10:54:06.1524519Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_exec_order__9m5sy9v) 2022-12-01T10:54:06.1526612Z 2022-12-01T10:54:06.1526982Z Running tests... 2022-12-01T10:54:06.1527549Z ---------------------------------------------------------------------- 2022-12-01T10:54:06.1528134Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_exec_order 2022-12-01T10:54:06.1528687Z test_invalid_first_iter_order_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestFSDPExecOrder) 2022-12-01T10:54:06.1530587Z Tests that FSDP errors if the all-gather order differs across ranks ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:54:06.1531129Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33678 2022-12-01T10:54:06.1531583Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33679 2022-12-01T10:54:06.1532238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1548192Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1548960Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1549695Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1550357Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1550846Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1551461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1551994Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1552467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1553007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1553720Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1554448Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1555015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1555527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1556843Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1557875Z warnings.warn( 2022-12-01T10:54:06.1559090Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1559903Z warnings.warn( 2022-12-01T10:54:06.1560729Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1561329Z warnings.warn( 2022-12-01T10:54:06.1562137Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1563426Z warnings.warn( 2022-12-01T10:54:06.1563673Z dist init r=0, world=2 2022-12-01T10:54:06.1563932Z dist init r=1, world=2 2022-12-01T10:54:06.1564175Z ok (5.494s) 2022-12-01T10:54:06.1564553Z test_invalid_first_iter_order_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestFSDPExecOrder) 2022-12-01T10:54:06.1565296Z Tests that FSDP errors if the all-gather order differs across ranks ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33757 2022-12-01T10:54:06.1565837Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33758 2022-12-01T10:54:06.1566427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1566885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1567461Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1567937Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1568619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1569097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1569677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1570148Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1570585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1571187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1571852Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1572551Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1573080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1573533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1574763Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1575528Z warnings.warn( 2022-12-01T10:54:06.1576641Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1577503Z warnings.warn( 2022-12-01T10:54:06.1578256Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1578801Z warnings.warn( 2022-12-01T10:54:06.1579546Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1580090Z warnings.warn( 2022-12-01T10:54:06.1580320Z dist init r=1, world=2 2022-12-01T10:54:06.1580573Z dist init r=0, world=2 2022-12-01T10:54:06.1580812Z ok (3.911s) 2022-12-01T10:54:06.1581217Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_FULL_SHARD_iters_before_path_change_1 (__main__.TestFSDPExecOrder) 2022-12-01T10:54:06.1581975Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33836 2022-12-01T10:54:06.1582522Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33837 2022-12-01T10:54:06.1583134Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1583568Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1584143Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1584616Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1585269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1585709Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1586281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1586745Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1587177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1587678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1588337Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1589037Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1589550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1590014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1591282Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1592042Z warnings.warn( 2022-12-01T10:54:06.1593151Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1593964Z warnings.warn( 2022-12-01T10:54:06.1594723Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1595263Z warnings.warn( 2022-12-01T10:54:06.1596010Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1596535Z warnings.warn( 2022-12-01T10:54:06.1597308Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1597852Z warnings.warn( 2022-12-01T10:54:06.1598601Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1599120Z warnings.warn( 2022-12-01T10:54:06.1599872Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1600408Z warnings.warn( 2022-12-01T10:54:06.1601161Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1601695Z warnings.warn( 2022-12-01T10:54:06.1602010Z dist init r=0, world=2 2022-12-01T10:54:06.1602272Z dist init r=1, world=2 2022-12-01T10:54:06.1602953Z ok (3.911s) 2022-12-01T10:54:06.1603453Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_FULL_SHARD_iters_before_path_change_3 (__main__.TestFSDPExecOrder) 2022-12-01T10:54:06.1604216Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33919 2022-12-01T10:54:06.1604758Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33920 2022-12-01T10:54:06.1605346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1605796Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1606380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1606855Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1607421Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1607867Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1608436Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1608884Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1609340Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1609838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1610634Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1611310Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1611838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1612311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1613538Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1614291Z warnings.warn( 2022-12-01T10:54:06.1615404Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1616158Z warnings.warn( 2022-12-01T10:54:06.1616918Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1617460Z warnings.warn( 2022-12-01T10:54:06.1618188Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1618727Z warnings.warn( 2022-12-01T10:54:06.1619573Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1620138Z warnings.warn( 2022-12-01T10:54:06.1620876Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1621429Z warnings.warn( 2022-12-01T10:54:06.1622178Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1622730Z warnings.warn( 2022-12-01T10:54:06.1623473Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1624019Z warnings.warn( 2022-12-01T10:54:06.1624270Z dist init r=0, world=2 2022-12-01T10:54:06.1624506Z dist init r=1, world=2 2022-12-01T10:54:06.1624744Z ok (3.911s) 2022-12-01T10:54:06.1625173Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_iters_before_path_change_1 (__main__.TestFSDPExecOrder) 2022-12-01T10:54:06.1625929Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34002 2022-12-01T10:54:06.1626453Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34003 2022-12-01T10:54:06.1627147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1627598Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1628162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1628635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1629217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1629663Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1630217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1630683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1631139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1631642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1632287Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1632981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1633507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1633962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1635186Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1635955Z warnings.warn( 2022-12-01T10:54:06.1637156Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1637919Z warnings.warn( 2022-12-01T10:54:06.1638689Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1639220Z warnings.warn( 2022-12-01T10:54:06.1639973Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1640512Z warnings.warn( 2022-12-01T10:54:06.1641276Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1641806Z warnings.warn( 2022-12-01T10:54:06.1643095Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1643666Z warnings.warn( 2022-12-01T10:54:06.1644436Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1645087Z warnings.warn( 2022-12-01T10:54:06.1645841Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1646376Z warnings.warn( 2022-12-01T10:54:06.1646608Z dist init r=1, world=2 2022-12-01T10:54:06.1646862Z dist init r=0, world=2 2022-12-01T10:54:06.1647102Z ok (3.912s) 2022-12-01T10:54:06.1647509Z test_invalid_later_iter_order_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_iters_before_path_change_3 (__main__.TestFSDPExecOrder) 2022-12-01T10:54:06.1648261Z Tests that FSDP warns the user if the all-gather order changes after ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34085 2022-12-01T10:54:06.1648810Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34086 2022-12-01T10:54:06.1649423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1649857Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1650434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1650902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1651480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1651904Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1652473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1652941Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1653376Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1654122Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1654681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1655336Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1655838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1656312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1657533Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1658294Z warnings.warn( 2022-12-01T10:54:06.1659411Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1660144Z warnings.warn( 2022-12-01T10:54:06.1660905Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1661530Z warnings.warn( 2022-12-01T10:54:06.1662289Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1662813Z warnings.warn( 2022-12-01T10:54:06.1663578Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1664126Z warnings.warn( 2022-12-01T10:54:06.1664879Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1665402Z warnings.warn( 2022-12-01T10:54:06.1666152Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:54:06.1666704Z warnings.warn( 2022-12-01T10:54:06.1667453Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:54:06.1667975Z warnings.warn( 2022-12-01T10:54:06.1668225Z dist init r=1, world=2 2022-12-01T10:54:06.1668478Z dist init r=0, world=2 2022-12-01T10:54:06.1668700Z ok (4.012s) 2022-12-01T10:54:06.1669181Z test_train_eval_sharding_strategy_ShardingStrategy_FULL_SHARD (__main__.TestFSDPExecOrder) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34168 2022-12-01T10:54:06.1669761Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34169 2022-12-01T10:54:06.1670434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1670883Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1671463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1671933Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1672517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1672944Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1673522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1673990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1674432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1674931Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1675601Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1676299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1676801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1677274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1678498Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1679341Z warnings.warn( 2022-12-01T10:54:06.1680455Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1681192Z warnings.warn( 2022-12-01T10:54:06.1681443Z dist init r=0, world=2 2022-12-01T10:54:06.1681695Z dist init r=1, world=2 2022-12-01T10:54:06.1681920Z ok (4.011s) 2022-12-01T10:54:06.1682783Z test_train_eval_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestFSDPExecOrder) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34251 2022-12-01T10:54:06.1683514Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34252 2022-12-01T10:54:06.1684135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1684568Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1685146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1685615Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1686193Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:06.1686625Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:06.1687202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:06.1687773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:06.1688230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:54:06.1688730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:54:06.1689391Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1690082Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:54:06.1690591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:06.1691116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:06.1692344Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1693110Z warnings.warn( 2022-12-01T10:54:06.1694223Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:54:06.1695048Z warnings.warn( 2022-12-01T10:54:06.1695298Z dist init r=0, world=2 2022-12-01T10:54:06.1695551Z dist init r=1, world=2 2022-12-01T10:54:06.1695778Z ok (4.012s) 2022-12-01T10:54:06.1695930Z 2022-12-01T10:54:06.1696206Z ---------------------------------------------------------------------- 2022-12-01T10:54:06.1696541Z Ran 8 tests in 33.175s 2022-12-01T10:54:06.1696705Z 2022-12-01T10:54:06.1696799Z OK 2022-12-01T10:54:06.1696914Z 2022-12-01T10:54:06.1697040Z Generating XML reports... 2022-12-01T10:54:06.1697653Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_exec_order/TEST-TestFSDPExecOrder-20221201105332.xml 2022-12-01T10:54:06.1698013Z 2022-12-01T10:54:06.1698617Z ##[endgroup] 2022-12-01T10:54:06.1699237Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_exec_order__9m5sy9v) 2022-12-01T10:54:06.1699604Z 2022-12-01T10:54:06.1699918Z Running distributed/algorithms/ddp_comm_hooks/test_ddp_hooks ... [2022-12-01 10:54:06.152480] 2022-12-01T10:54:06.1700668Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:54:06.152783] 2022-12-01T10:54:38.6245161Z 2022-12-01T10:54:38.6245759Z Expand the folded group to see the log file of distributed/algorithms/ddp_comm_hooks/test_ddp_hooks 2022-12-01T10:54:38.6249543Z ##[group]PRINTING LOG FILE of distributed/algorithms/ddp_comm_hooks/test_ddp_hooks (/var/lib/jenkins/workspace/test/test-reports/distributed-algorithms-ddp_comm_hooks-test_ddp_hooks_qamj9ftx) 2022-12-01T10:54:38.6249995Z 2022-12-01T10:54:38.6250110Z Running tests... 2022-12-01T10:54:38.6250639Z ---------------------------------------------------------------------- 2022-12-01T10:54:38.6251247Z Test results will be stored in test-reports/python-unittest/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks 2022-12-01T10:54:38.6251798Z test_ddp_comm_hook_allreduce_hook (__main__.DistributedDataParallelCommHookTest) 2022-12-01T10:54:38.6253272Z This unit test verifies the ``allreduce`` hook registered case gives same result ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:54:38.6253818Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34369 2022-12-01T10:54:38.6254277Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34370 2022-12-01T10:54:38.6254991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6255856Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6257105Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6257952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6258570Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6259029Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6259619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6260087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6260510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:38.6260983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:38.6261492Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1t46hkqv 2022-12-01T10:54:38.6262028Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1t46hkqv/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6262572Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkoj0_h99 2022-12-01T10:54:38.6263475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkoj0_h99/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6265272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6266582Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6267514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6268180Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6268450Z ok (6.069s) 2022-12-01T10:54:38.6268828Z test_ddp_comm_hook_fp16compress_hook (__main__.DistributedDataParallelCommHookTest) 2022-12-01T10:54:38.6269473Z This unit test verifies the ``fp16 compress`` hook registered case ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34453 2022-12-01T10:54:38.6269988Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34454 2022-12-01T10:54:38.6270602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6271050Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6271604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6272070Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6272650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6273097Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6273747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6274238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6274672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:38.6275141Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:38.6275626Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx367gwuh 2022-12-01T10:54:38.6276166Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx367gwuh/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6276699Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxvwn4c48 2022-12-01T10:54:38.6277214Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxvwn4c48/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6278279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6278947Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6279875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6280526Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6280777Z ok (4.513s) 2022-12-01T10:54:38.6281136Z test_ddp_comm_hook_noop_hook (__main__.DistributedDataParallelCommHookTest) 2022-12-01T10:54:38.6281807Z This unit test verifies the ``noop`` hook registered case and a subsequent allreduce ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34537 2022-12-01T10:54:38.6282346Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34538 2022-12-01T10:54:38.6283501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6283957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6284533Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6284984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6285560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6286003Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6286560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6287027Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6287462Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:38.6287932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:38.6288460Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpexlyyebn 2022-12-01T10:54:38.6289006Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpexlyyebn/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6289540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplxjtuqcf 2022-12-01T10:54:38.6290075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplxjtuqcf/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6291239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6291926Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6292857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6293503Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6293753Z ok (4.512s) 2022-12-01T10:54:38.6294136Z test_ddp_comm_hook_quantize_per_channel_hook (__main__.DistributedDataParallelCommHookTest) 2022-12-01T10:54:38.6294723Z This unit test verifies the ``quantize per channel`` hook registered case ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34621 2022-12-01T10:54:38.6295270Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34622 2022-12-01T10:54:38.6295867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6296317Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6296891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6297340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6297921Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6298364Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6298929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6299479Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6299919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:38.6300388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:38.6300866Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcm_fv489 2022-12-01T10:54:38.6301405Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcm_fv489/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6301932Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg349djtx 2022-12-01T10:54:38.6302469Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg349djtx/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6303502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6304176Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6305104Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6305753Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6306001Z ok (4.512s) 2022-12-01T10:54:38.6306379Z test_ddp_comm_hook_quantize_per_tensor_hook (__main__.DistributedDataParallelCommHookTest) 2022-12-01T10:54:38.6306972Z This unit test verifies the ``quantize per tensor`` hook registered case ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34705 2022-12-01T10:54:38.6307514Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34706 2022-12-01T10:54:38.6308166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6308630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6309205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6309676Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6310236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6310675Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6311242Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6311685Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6312123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:38.6312595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:38.6313101Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpve3m2f5p 2022-12-01T10:54:38.6313628Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpve3m2f5p/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6314162Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp56j3y5tv 2022-12-01T10:54:38.6314695Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp56j3y5tv/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6315747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6316471Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6317416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-12-01T10:54:38.6318077Z warnings.warn(msg, FutureWarning) 2022-12-01T10:54:38.6318342Z ok (4.411s) 2022-12-01T10:54:38.6318789Z test_is_last_hook (__main__.DistributedDataParallelCommHookTest) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34789 2022-12-01T10:54:38.6319349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34790 2022-12-01T10:54:38.6319958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6320392Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6320971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6321439Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6322017Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:54:38.6322913Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:54:38.6323587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:54:38.6324050Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:54:38.6324485Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:54:38.6324938Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:54:38.6325440Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf19gmcfb 2022-12-01T10:54:38.6326106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf19gmcfb/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6326647Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5_predv2 2022-12-01T10:54:38.6327183Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5_predv2/_remote_module_non_scriptable.py 2022-12-01T10:54:38.6327560Z ok (6.616s) 2022-12-01T10:54:38.6327708Z 2022-12-01T10:54:38.6327986Z ---------------------------------------------------------------------- 2022-12-01T10:54:38.6328297Z Ran 6 tests in 30.635s 2022-12-01T10:54:38.6328460Z 2022-12-01T10:54:38.6328553Z OK 2022-12-01T10:54:38.6328685Z 2022-12-01T10:54:38.6328807Z Generating XML reports... 2022-12-01T10:54:38.6329512Z Generated XML report: test-reports/python-unittest/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks/TEST-DistributedDataParallelCommHookTest-20221201105407.xml 2022-12-01T10:54:38.6329966Z 2022-12-01T10:54:38.6330296Z ##[endgroup] 2022-12-01T10:54:38.6330975Z FINISHED PRINTING LOG FILE of distributed/algorithms/ddp_comm_hooks/test_ddp_hooks (/var/lib/jenkins/workspace/test/test-reports/distributed-algorithms-ddp_comm_hooks-test_ddp_hooks_qamj9ftx) 2022-12-01T10:54:38.6331386Z 2022-12-01T10:54:38.6331653Z Running distributed/fsdp/test_fsdp_clip_grad_norm ... [2022-12-01 10:54:38.624607] 2022-12-01T10:54:38.6332348Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_clip_grad_norm.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:54:38.624870] 2022-12-01T10:55:45.1517072Z 2022-12-01T10:55:45.1517832Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_clip_grad_norm 2022-12-01T10:55:45.1518923Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_clip_grad_norm (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_clip_grad_norm_a3b18ex2) 2022-12-01T10:55:45.1522389Z 2022-12-01T10:55:45.1523533Z Running tests... 2022-12-01T10:55:45.1524197Z ---------------------------------------------------------------------- 2022-12-01T10:55:45.1524796Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_clip_grad_norm 2022-12-01T10:55:45.1525306Z test_fsdp_calc_grad_norm_norm_type_1_3_nested_fsdp_False (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1527498Z Test grad norm cal API. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:55:45.1527969Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34908 2022-12-01T10:55:45.1528431Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34909 2022-12-01T10:55:45.1529121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1529601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1530187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1530671Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1531264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1531697Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1532272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1532747Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1533207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1533694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1534367Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1535257Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1535821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1536279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1537164Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1537786Z warnings.warn( 2022-12-01T10:55:45.1538551Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1539112Z warnings.warn( 2022-12-01T10:55:45.1539874Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1540435Z warnings.warn( 2022-12-01T10:55:45.1541196Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1541738Z warnings.warn( 2022-12-01T10:55:45.1541975Z dist init r=1, world=2 2022-12-01T10:55:45.1542235Z dist init r=0, world=2 2022-12-01T10:55:45.1542472Z ok (5.392s) 2022-12-01T10:55:45.1542804Z test_fsdp_calc_grad_norm_norm_type_1_3_nested_fsdp_True (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1543423Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34991 2022-12-01T10:55:45.1543916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34992 2022-12-01T10:55:45.1544537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1545004Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1545829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1546304Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1546867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1547317Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1547882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1548345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1548789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1549288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1549944Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1550609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1551133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1551606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1552538Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1553081Z warnings.warn( 2022-12-01T10:55:45.1553841Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1554389Z warnings.warn( 2022-12-01T10:55:45.1555158Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1555689Z warnings.warn( 2022-12-01T10:55:45.1556444Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1556996Z warnings.warn( 2022-12-01T10:55:45.1557247Z dist init r=0, world=2 2022-12-01T10:55:45.1557482Z dist init r=1, world=2 2022-12-01T10:55:45.1557722Z ok (3.911s) 2022-12-01T10:55:45.1558066Z test_fsdp_calc_grad_norm_norm_type_2_0_nested_fsdp_False (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1558539Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35074 2022-12-01T10:55:45.1559020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35075 2022-12-01T10:55:45.1559633Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1560083Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1560626Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1561148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1561727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1562176Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1563139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1563607Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1564061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1564543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1565201Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1565898Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1566422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1566876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1567739Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1568285Z warnings.warn( 2022-12-01T10:55:45.1569037Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1569565Z warnings.warn( 2022-12-01T10:55:45.1570413Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1570985Z warnings.warn( 2022-12-01T10:55:45.1571744Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1572267Z warnings.warn( 2022-12-01T10:55:45.1572515Z dist init r=1, world=2 2022-12-01T10:55:45.1572767Z dist init r=0, world=2 2022-12-01T10:55:45.1572988Z ok (3.911s) 2022-12-01T10:55:45.1573331Z test_fsdp_calc_grad_norm_norm_type_2_0_nested_fsdp_True (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1573819Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35157 2022-12-01T10:55:45.1574294Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35158 2022-12-01T10:55:45.1574904Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1575355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1575924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1576376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1576948Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1577391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1577961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1578509Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1578967Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1579466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1580113Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1580802Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1581324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1581796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1582647Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1583201Z warnings.warn( 2022-12-01T10:55:45.1583994Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1584536Z warnings.warn( 2022-12-01T10:55:45.1585281Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1585830Z warnings.warn( 2022-12-01T10:55:45.1586587Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1587133Z warnings.warn( 2022-12-01T10:55:45.1587362Z dist init r=1, world=2 2022-12-01T10:55:45.1587673Z dist init r=0, world=2 2022-12-01T10:55:45.1587926Z ok (3.911s) 2022-12-01T10:55:45.1588255Z test_fsdp_calc_grad_norm_norm_type_2_5_nested_fsdp_False (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1588745Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35240 2022-12-01T10:55:45.1589226Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35241 2022-12-01T10:55:45.1589839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1590269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1590841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1591311Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1591874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1592320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1592885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1593344Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1593780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1594275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1594931Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1595674Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1596200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1596673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1597539Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1598067Z warnings.warn( 2022-12-01T10:55:45.1598820Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1599362Z warnings.warn( 2022-12-01T10:55:45.1600126Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1600654Z warnings.warn( 2022-12-01T10:55:45.1601407Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1601953Z warnings.warn( 2022-12-01T10:55:45.1602204Z dist init r=1, world=2 2022-12-01T10:55:45.1602623Z dist init r=0, world=2 2022-12-01T10:55:45.1602872Z ok (4.011s) 2022-12-01T10:55:45.1603214Z test_fsdp_calc_grad_norm_norm_type_2_5_nested_fsdp_True (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1603684Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35323 2022-12-01T10:55:45.1604176Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35324 2022-12-01T10:55:45.1604873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1605346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1605909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1606375Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1606946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1607371Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1607939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1608406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1608859Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1609343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1609998Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1610685Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1611204Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1611657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1612520Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1613158Z warnings.warn( 2022-12-01T10:55:45.1613930Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1614458Z warnings.warn( 2022-12-01T10:55:45.1615226Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1615777Z warnings.warn( 2022-12-01T10:55:45.1616533Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1617067Z warnings.warn( 2022-12-01T10:55:45.1617314Z dist init r=1, world=2 2022-12-01T10:55:45.1617565Z dist init r=0, world=2 2022-12-01T10:55:45.1617789Z ok (3.911s) 2022-12-01T10:55:45.1618139Z test_fsdp_calc_grad_norm_norm_type_inf_nested_fsdp_False (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1618630Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35406 2022-12-01T10:55:45.1619093Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35407 2022-12-01T10:55:45.1619702Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1620151Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1620728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1621183Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1621757Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1622272Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1622864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1623313Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1623766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1624421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1624944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1625593Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1626120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1626593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1627448Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1627998Z warnings.warn( 2022-12-01T10:55:45.1628750Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1629289Z warnings.warn( 2022-12-01T10:55:45.1630037Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1630657Z warnings.warn( 2022-12-01T10:55:45.1631524Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1632235Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1633078Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1633630Z warnings.warn( 2022-12-01T10:55:45.1634496Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1635209Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1635535Z dist init r=0, world=2 2022-12-01T10:55:45.1635785Z dist init r=1, world=2 2022-12-01T10:55:45.1636024Z ok (3.911s) 2022-12-01T10:55:45.1636374Z test_fsdp_calc_grad_norm_norm_type_inf_nested_fsdp_True (__main__.TestCalcuGradNorm) 2022-12-01T10:55:45.1636851Z Test grad norm cal API. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35489 2022-12-01T10:55:45.1637335Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35490 2022-12-01T10:55:45.1637949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1638384Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1639003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1639462Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1640034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1640486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1641068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1641534Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1641970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1642856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1643543Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1644234Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1644741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1645212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1646081Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1646632Z warnings.warn( 2022-12-01T10:55:45.1647482Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1648018Z warnings.warn( 2022-12-01T10:55:45.1648790Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1649338Z warnings.warn( 2022-12-01T10:55:45.1650071Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1650608Z warnings.warn( 2022-12-01T10:55:45.1651468Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1652187Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1653118Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1653823Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1654167Z dist init r=1, world=2 2022-12-01T10:55:45.1654419Z dist init r=0, world=2 2022-12-01T10:55:45.1654641Z ok (3.912s) 2022-12-01T10:55:45.1655052Z test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_False_cpu_offload_CPUOffload(offload_params=False) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1655611Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35572 2022-12-01T10:55:45.1656175Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35573 2022-12-01T10:55:45.1656798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1657247Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1657820Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1658271Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1658846Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1659286Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1659861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1660312Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1660767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1661264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1661906Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1662594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1663118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1663591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1664374Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1665031Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1665855Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1666402Z warnings.warn( 2022-12-01T10:55:45.1666993Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1667638Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1668461Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1669003Z warnings.warn( 2022-12-01T10:55:45.1669772Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1670301Z warnings.warn( 2022-12-01T10:55:45.1671054Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1671611Z warnings.warn( 2022-12-01T10:55:45.1672493Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1673149Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1674036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1674676Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1674990Z dist init r=1, world=2 2022-12-01T10:55:45.1675223Z dist init r=0, world=2 2022-12-01T10:55:45.1675464Z ok (4.011s) 2022-12-01T10:55:45.1675877Z test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_False_cpu_offload_CPUOffload(offload_params=True) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1676420Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35655 2022-12-01T10:55:45.1676914Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35656 2022-12-01T10:55:45.1677521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1677971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1678527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1678998Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1679568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1680058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1680636Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1681098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1681551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1682029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1682948Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1683676Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1684201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1684659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1685370Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1686023Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1686847Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1687377Z warnings.warn( 2022-12-01T10:55:45.1687979Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1688614Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1689516Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1690063Z warnings.warn( 2022-12-01T10:55:45.1690837Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1691387Z warnings.warn( 2022-12-01T10:55:45.1692143Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1692671Z warnings.warn( 2022-12-01T10:55:45.1693507Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1694164Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1695055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1695704Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1696001Z dist init r=1, world=2 2022-12-01T10:55:45.1696341Z dist init r=0, world=2 2022-12-01T10:55:45.1696583Z ok (4.011s) 2022-12-01T10:55:45.1696975Z test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_True_cpu_offload_CPUOffload(offload_params=False) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1697537Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35738 2022-12-01T10:55:45.1698029Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35739 2022-12-01T10:55:45.1698627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1699075Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1699647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1700113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1700674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1701126Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1701695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1702157Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1702593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1703094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1703754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1704426Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1704955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1705426Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1706199Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1706843Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1707502Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1708138Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1708975Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1709521Z warnings.warn( 2022-12-01T10:55:45.1710255Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1710793Z warnings.warn( 2022-12-01T10:55:45.1711555Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1712098Z warnings.warn( 2022-12-01T10:55:45.1712836Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1713457Z warnings.warn( 2022-12-01T10:55:45.1714288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1714948Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1715811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1716463Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1716777Z dist init r=1, world=2 2022-12-01T10:55:45.1717027Z dist init r=0, world=2 2022-12-01T10:55:45.1717248Z ok (3.911s) 2022-12-01T10:55:45.1717659Z test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_True_cpu_offload_CPUOffload(offload_params=True) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1718212Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35821 2022-12-01T10:55:45.1718686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35822 2022-12-01T10:55:45.1719299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1719748Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1720319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1720771Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1721353Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1721852Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1722706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1723182Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1723637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1724129Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1724776Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1725461Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1725989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1726464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1727159Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1727808Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1728462Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1729194Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1730008Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1730555Z warnings.warn( 2022-12-01T10:55:45.1731302Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1731844Z warnings.warn( 2022-12-01T10:55:45.1732599Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1733144Z warnings.warn( 2022-12-01T10:55:45.1733906Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1734453Z warnings.warn( 2022-12-01T10:55:45.1735265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1735921Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1736806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1737450Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1737745Z dist init r=0, world=2 2022-12-01T10:55:45.1737995Z dist init r=1, world=2 2022-12-01T10:55:45.1738323Z ok (3.910s) 2022-12-01T10:55:45.1738740Z test_fsdp_clip_grad_norm_norm_type_inf_nested_fsdp_False_cpu_offload_CPUOffload(offload_params=False) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1739299Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35904 2022-12-01T10:55:45.1739792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35905 2022-12-01T10:55:45.1740405Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1740838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1741412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1741885Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1742465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1742890Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1743455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1743916Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1744348Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1744842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1745497Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1746259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1746769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1747241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1747955Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1748600Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1749239Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1749876Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1750707Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1751257Z warnings.warn( 2022-12-01T10:55:45.1751995Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1752539Z warnings.warn( 2022-12-01T10:55:45.1753306Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1753857Z warnings.warn( 2022-12-01T10:55:45.1754643Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1755196Z warnings.warn( 2022-12-01T10:55:45.1756023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1756679Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1757561Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1758202Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1759124Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1759834Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1760791Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1761490Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1761880Z dist init r=1, world=2 2022-12-01T10:55:45.1762134Z dist init r=0, world=2 2022-12-01T10:55:45.1762595Z ok (4.011s) 2022-12-01T10:55:45.1763020Z test_fsdp_clip_grad_norm_norm_type_inf_nested_fsdp_False_cpu_offload_CPUOffload(offload_params=True) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1763579Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35987 2022-12-01T10:55:45.1764069Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35988 2022-12-01T10:55:45.1764667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1765116Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1765687Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1766161Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1766724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1767172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1767741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1768205Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1768644Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1769142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1769798Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1770466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1770995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1771579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1772318Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1772946Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1773773Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1774328Z warnings.warn( 2022-12-01T10:55:45.1774937Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1775579Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1776381Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1776923Z warnings.warn( 2022-12-01T10:55:45.1777691Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1778332Z warnings.warn( 2022-12-01T10:55:45.1779092Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1779649Z warnings.warn( 2022-12-01T10:55:45.1780471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1781121Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1781992Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1782649Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1783619Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1784330Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1785267Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1785972Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1786318Z dist init r=1, world=2 2022-12-01T10:55:45.1786569Z dist init r=0, world=2 2022-12-01T10:55:45.1786792Z ok (4.011s) 2022-12-01T10:55:45.1787263Z test_fsdp_clip_grad_norm_norm_type_inf_nested_fsdp_True_cpu_offload_CPUOffload(offload_params=False) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1787834Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36070 2022-12-01T10:55:45.1788330Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36071 2022-12-01T10:55:45.1788927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1789373Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1789946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1790397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1790981Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1791425Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1791994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1792441Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1792895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1793392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1794031Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1794725Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1795319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1795798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1796498Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1797146Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1797976Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1798522Z warnings.warn( 2022-12-01T10:55:45.1799135Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1799761Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1800581Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1801128Z warnings.warn( 2022-12-01T10:55:45.1801900Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1802673Z warnings.warn( 2022-12-01T10:55:45.1803539Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1804118Z warnings.warn( 2022-12-01T10:55:45.1804951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1805585Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1806474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1807121Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1808045Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1808759Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1809695Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1810395Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1810740Z dist init r=0, world=2 2022-12-01T10:55:45.1811061Z dist init r=1, world=2 2022-12-01T10:55:45.1811301Z ok (3.912s) 2022-12-01T10:55:45.1811713Z test_fsdp_clip_grad_norm_norm_type_inf_nested_fsdp_True_cpu_offload_CPUOffload(offload_params=True) (__main__.TestClipGradNorm) 2022-12-01T10:55:45.1812273Z Test FSDP with clip grad norm. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36153 2022-12-01T10:55:45.1812749Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36154 2022-12-01T10:55:45.1813368Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1813815Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1814369Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1814839Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1815415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:55:45.1815862Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:55:45.1816415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:55:45.1816880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:55:45.1817336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:55:45.1817834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:55:45.1818477Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1819162Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:55:45.1819682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:55:45.1820132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:55:45.1820903Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1821564Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1822220Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:51: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1822862Z in_data = torch.tensor(input[self.rank], device=self.rank) 2022-12-01T10:55:45.1823681Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1824234Z warnings.warn( 2022-12-01T10:55:45.1824989Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:55:45.1825530Z warnings.warn( 2022-12-01T10:55:45.1826330Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1826884Z warnings.warn( 2022-12-01T10:55:45.1827641Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:55:45.1828250Z warnings.warn( 2022-12-01T10:55:45.1829071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1829724Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1830612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1067: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1831255Z return_norm = torch.tensor(total_norm ** norm_type, device=rank) 2022-12-01T10:55:45.1832215Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1832910Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1833862Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:4328: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). 2022-12-01T10:55:45.1834571Z local_norm = torch.tensor(max(par.grad.detach().abs().max() for par in parameters)) 2022-12-01T10:55:45.1834914Z dist init r=1, world=2 2022-12-01T10:55:45.1835149Z dist init r=0, world=2 2022-12-01T10:55:45.1835392Z ok (4.011s) 2022-12-01T10:55:45.1835543Z 2022-12-01T10:55:45.1835812Z ---------------------------------------------------------------------- 2022-12-01T10:55:45.1836124Z Ran 16 tests in 64.657s 2022-12-01T10:55:45.1836289Z 2022-12-01T10:55:45.1836438Z OK 2022-12-01T10:55:45.1836583Z 2022-12-01T10:55:45.1836709Z Generating XML reports... 2022-12-01T10:55:45.1837326Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_clip_grad_norm/TEST-TestCalcuGradNorm-20221201105440.xml 2022-12-01T10:55:45.1838098Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_clip_grad_norm/TEST-TestClipGradNorm-20221201105440.xml 2022-12-01T10:55:45.1838456Z 2022-12-01T10:55:45.1838847Z ##[endgroup] 2022-12-01T10:55:45.1839476Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_clip_grad_norm (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_clip_grad_norm_a3b18ex2) 2022-12-01T10:55:45.1839844Z 2022-12-01T10:55:45.1840113Z Running distributed/fsdp/test_fsdp_ignored_modules ... [2022-12-01 10:55:45.152196] 2022-12-01T10:55:45.1840832Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_ignored_modules.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:55:45.152497] 2022-12-01T10:56:08.2525462Z 2022-12-01T10:56:08.2526135Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_ignored_modules 2022-12-01T10:56:08.2530165Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_ignored_modules (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_ignored_modules_td2oxx1u) 2022-12-01T10:56:08.2530598Z 2022-12-01T10:56:08.2531947Z Running tests... 2022-12-01T10:56:08.2532737Z ---------------------------------------------------------------------- 2022-12-01T10:56:08.2533386Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_ignored_modules 2022-12-01T10:56:08.2533931Z test_diff_ignored_modules_across_ranks_pass_ignored_modules_to_root_False (__main__.TestFSDPIgnoredModules) 2022-12-01T10:56:08.2534688Z Tests ignoring different modules across ranks. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:56:08.2535151Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36271 2022-12-01T10:56:08.2535608Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36272 2022-12-01T10:56:08.2536248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2536723Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2537287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2537765Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2538341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2538801Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2539358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2539836Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2540302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:08.2540816Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:08.2546350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2547097Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2547623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:08.2548122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:08.2549174Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2549826Z warnings.warn( 2022-12-01T10:56:08.2550603Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2551145Z warnings.warn( 2022-12-01T10:56:08.2551904Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2552461Z warnings.warn( 2022-12-01T10:56:08.2553280Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2553815Z warnings.warn( 2022-12-01T10:56:08.2554061Z dist init r=0, world=2 2022-12-01T10:56:08.2554313Z dist init r=1, world=2 2022-12-01T10:56:08.2554534Z ok (5.493s) 2022-12-01T10:56:08.2554922Z test_diff_ignored_modules_across_ranks_pass_ignored_modules_to_root_True (__main__.TestFSDPIgnoredModules) 2022-12-01T10:56:08.2555478Z Tests ignoring different modules across ranks. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36354 2022-12-01T10:56:08.2555998Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36355 2022-12-01T10:56:08.2556588Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2557148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2557722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2558178Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2558772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2559202Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2559771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2560234Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2560672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:08.2561170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:08.2561837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2562924Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2563445Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:08.2563919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:08.2564799Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2565347Z warnings.warn( 2022-12-01T10:56:08.2566085Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2566742Z warnings.warn( 2022-12-01T10:56:08.2567537Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2568084Z warnings.warn( 2022-12-01T10:56:08.2568824Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2569372Z warnings.warn( 2022-12-01T10:56:08.2569622Z dist init r=0, world=2 2022-12-01T10:56:08.2569856Z dist init r=1, world=2 2022-12-01T10:56:08.2570096Z ok (4.111s) 2022-12-01T10:56:08.2570419Z test_ignored_modules_invalid (__main__.TestFSDPIgnoredModules) 2022-12-01T10:56:08.2570941Z Tests that passing an FSDP module as an ignored module or the ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36437 2022-12-01T10:56:08.2571451Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36438 2022-12-01T10:56:08.2572060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2572507Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2573063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2573533Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2574113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2574660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2575222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2575691Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2576147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:08.2576625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:08.2577286Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2577978Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2578495Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:08.2578951Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:08.2579306Z dist init r=0, world=2 2022-12-01T10:56:08.2579557Z dist init r=1, world=2 2022-12-01T10:56:08.2579781Z ok (3.510s) 2022-12-01T10:56:08.2580101Z test_ignored_modules_nested (__main__.TestFSDPIgnoredModules) 2022-12-01T10:56:08.2580617Z Tests that passing a module with nested FSDP modules does not ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36516 2022-12-01T10:56:08.2581145Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36517 2022-12-01T10:56:08.2581741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2582235Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2582822Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2583284Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2583907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2584373Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2584947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2585390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2585842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:08.2586334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:08.2586986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2587664Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2588188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:08.2588656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:08.2589526Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2590060Z warnings.warn( 2022-12-01T10:56:08.2590814Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2591356Z warnings.warn( 2022-12-01T10:56:08.2592204Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2592740Z warnings.warn( 2022-12-01T10:56:08.2593497Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2594047Z warnings.warn( 2022-12-01T10:56:08.2594278Z dist init r=0, world=2 2022-12-01T10:56:08.2594529Z dist init r=1, world=2 2022-12-01T10:56:08.2594768Z ok (3.912s) 2022-12-01T10:56:08.2595095Z test_ignored_modules_transformer (__main__.TestFSDPIgnoredModules) 2022-12-01T10:56:08.2595744Z Tests that ignored modules' parameters are not flattened for a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36599 2022-12-01T10:56:08.2596280Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36600 2022-12-01T10:56:08.2596890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2597322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2597897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2598365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2598945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:08.2599370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:08.2599940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:08.2600404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:08.2600843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:08.2601403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:08.2602081Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2603094Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:08.2603599Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:08.2604070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:08.2604940Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2605491Z warnings.warn( 2022-12-01T10:56:08.2606222Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:08.2606756Z warnings.warn( 2022-12-01T10:56:08.2607519Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2608073Z warnings.warn( 2022-12-01T10:56:08.2608812Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:08.2609466Z warnings.warn( 2022-12-01T10:56:08.2609713Z dist init r=0, world=2 2022-12-01T10:56:08.2609963Z dist init r=1, world=2 2022-12-01T10:56:08.2610185Z ok (4.212s) 2022-12-01T10:56:08.2610337Z 2022-12-01T10:56:08.2610616Z ---------------------------------------------------------------------- 2022-12-01T10:56:08.2610949Z Ran 5 tests in 21.239s 2022-12-01T10:56:08.2611090Z 2022-12-01T10:56:08.2611183Z OK 2022-12-01T10:56:08.2611316Z 2022-12-01T10:56:08.2611441Z Generating XML reports... 2022-12-01T10:56:08.2612082Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_ignored_modules/TEST-TestFSDPIgnoredModules-20221201105546.xml 2022-12-01T10:56:08.2612468Z 2022-12-01T10:56:08.2612850Z ##[endgroup] 2022-12-01T10:56:08.2613487Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_ignored_modules (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_ignored_modules_td2oxx1u) 2022-12-01T10:56:08.2613874Z 2022-12-01T10:56:08.2614147Z Running distributed/fsdp/test_fsdp_apply ... [2022-12-01 10:56:08.252769] 2022-12-01T10:56:08.2614828Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_apply.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:08.253080] 2022-12-01T10:56:22.6555386Z 2022-12-01T10:56:22.6555897Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_apply 2022-12-01T10:56:22.6557101Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_apply (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_apply_6dnkwz8m) 2022-12-01T10:56:22.6557472Z 2022-12-01T10:56:22.6557586Z Running tests... 2022-12-01T10:56:22.6558082Z ---------------------------------------------------------------------- 2022-12-01T10:56:22.6558647Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_apply 2022-12-01T10:56:22.6559087Z test_apply_in_summon_raises_error (__main__.TestApply) 2022-12-01T10:56:22.6559732Z Tests that calling ``apply()`` on an FSDP instance inside the ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:56:22.6560285Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36717 2022-12-01T10:56:22.6560966Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36718 2022-12-01T10:56:22.6561661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:22.6562113Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:22.6562982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:22.6563452Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:22.6564037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:22.6564465Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:22.6565039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:22.6565508Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:22.6565983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:22.6566746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:22.6567419Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:22.6568124Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:22.6568641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:22.6569095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:22.6570476Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:22.6571233Z warnings.warn( 2022-12-01T10:56:22.6572879Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:22.6574097Z warnings.warn( 2022-12-01T10:56:22.6574879Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:22.6575409Z warnings.warn( 2022-12-01T10:56:22.6576167Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:22.6576704Z warnings.warn( 2022-12-01T10:56:22.6576974Z File "", line 1, in 2022-12-01T10:56:22.6577325Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main 2022-12-01T10:56:22.6577697Z exitcode = _main(fd, parent_sentinel) 2022-12-01T10:56:22.6578059Z File "/opt/conda/lib/python3.10/multiprocessing/spawn.py", line 129, in _main 2022-12-01T10:56:22.6578423Z return self._bootstrap(parent_sentinel) 2022-12-01T10:56:22.6578864Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap 2022-12-01T10:56:22.6579508Z self.run() 2022-12-01T10:56:22.6579845Z File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run 2022-12-01T10:56:22.6580215Z self._target(*self._args, **self._kwargs) 2022-12-01T10:56:22.6580732Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py", line 785, in _run 2022-12-01T10:56:22.6581152Z self.run_test(test_name, pipe) 2022-12-01T10:56:22.6581661Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 622, in run_test 2022-12-01T10:56:22.6582059Z getattr(self, test_name)() 2022-12-01T10:56:22.6582663Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 503, in wrapper 2022-12-01T10:56:22.6583028Z fn() 2022-12-01T10:56:22.6583506Z File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 145, in wrapper 2022-12-01T10:56:22.6583879Z return func(*args, **kwargs) 2022-12-01T10:56:22.6584295Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_apply.py", line 101, in test_apply_in_summon_raises_error 2022-12-01T10:56:22.6584729Z transformer.apply(self._init_linear_weights) 2022-12-01T10:56:22.6585276Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1667, in apply 2022-12-01T10:56:22.6585708Z self._assert_state(TrainingState_.IDLE) 2022-12-01T10:56:22.6586272Z File "/opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 3582, in _assert_state 2022-12-01T10:56:22.6586684Z traceback.print_stack() 2022-12-01T10:56:22.6586930Z dist init r=1, world=2 2022-12-01T10:56:22.6587180Z dist init r=0, world=2 2022-12-01T10:56:22.6587588Z Asserting FSDP instance is: FullyShardedDataParallel( 2022-12-01T10:56:22.6587947Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-12-01T10:56:22.6588304Z (_fpw_module): TransformerWithSharedParams( 2022-12-01T10:56:22.6588634Z (embed_tokens): Embedding(23, 16) 2022-12-01T10:56:22.6588905Z (transformer): Transformer( 2022-12-01T10:56:22.6589192Z (encoder): TransformerEncoder( 2022-12-01T10:56:22.6589473Z (layers): ModuleList( 2022-12-01T10:56:22.6589749Z (0): FullyShardedDataParallel( 2022-12-01T10:56:22.6590098Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-12-01T10:56:22.6590454Z (_fpw_module): TransformerEncoderLayer( 2022-12-01T10:56:22.6590789Z (self_attn): MultiheadAttention( 2022-12-01T10:56:22.6591185Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-12-01T10:56:22.6591542Z ) 2022-12-01T10:56:22.6591843Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-12-01T10:56:22.6592185Z (dropout): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6592540Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-12-01T10:56:22.6593007Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6593444Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6593800Z (dropout1): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6594130Z (dropout2): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6594398Z ) 2022-12-01T10:56:22.6594603Z ) 2022-12-01T10:56:22.6594821Z ) 2022-12-01T10:56:22.6595093Z (1): FullyShardedDataParallel( 2022-12-01T10:56:22.6595425Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-12-01T10:56:22.6595774Z (_fpw_module): TransformerEncoderLayer( 2022-12-01T10:56:22.6596116Z (self_attn): MultiheadAttention( 2022-12-01T10:56:22.6596519Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-12-01T10:56:22.6596948Z ) 2022-12-01T10:56:22.6597270Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-12-01T10:56:22.6597605Z (dropout): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6597961Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-12-01T10:56:22.6598419Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6598870Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6599209Z (dropout1): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6599539Z (dropout2): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6599807Z ) 2022-12-01T10:56:22.6600016Z ) 2022-12-01T10:56:22.6600232Z ) 2022-12-01T10:56:22.6600455Z ) 2022-12-01T10:56:22.6600809Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6601099Z ) 2022-12-01T10:56:22.6601356Z (decoder): TransformerDecoder( 2022-12-01T10:56:22.6601630Z (layers): ModuleList( 2022-12-01T10:56:22.6601919Z (0): FullyShardedDataParallel( 2022-12-01T10:56:22.6602268Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-12-01T10:56:22.6602924Z (_fpw_module): TransformerDecoderLayer( 2022-12-01T10:56:22.6603262Z (self_attn): MultiheadAttention( 2022-12-01T10:56:22.6603678Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-12-01T10:56:22.6604031Z ) 2022-12-01T10:56:22.6604297Z (multihead_attn): MultiheadAttention( 2022-12-01T10:56:22.6604834Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-12-01T10:56:22.6605181Z ) 2022-12-01T10:56:22.6605475Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-12-01T10:56:22.6605828Z (dropout): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6606181Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-12-01T10:56:22.6606641Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6607094Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6607534Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6607890Z (dropout1): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6608208Z (dropout2): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6608537Z (dropout3): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6608810Z ) 2022-12-01T10:56:22.6609022Z ) 2022-12-01T10:56:22.6609233Z ) 2022-12-01T10:56:22.6609509Z (1): FullyShardedDataParallel( 2022-12-01T10:56:22.6609844Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-12-01T10:56:22.6610195Z (_fpw_module): TransformerDecoderLayer( 2022-12-01T10:56:22.6610530Z (self_attn): MultiheadAttention( 2022-12-01T10:56:22.6610945Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-12-01T10:56:22.6611284Z ) 2022-12-01T10:56:22.6611570Z (multihead_attn): MultiheadAttention( 2022-12-01T10:56:22.6611992Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-12-01T10:56:22.6612323Z ) 2022-12-01T10:56:22.6612632Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-12-01T10:56:22.6612989Z (dropout): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6613324Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-12-01T10:56:22.6613929Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6614412Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6614857Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6615195Z (dropout1): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6615531Z (dropout2): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6615858Z (dropout3): Dropout(p=0.1, inplace=False) 2022-12-01T10:56:22.6616107Z ) 2022-12-01T10:56:22.6616334Z ) 2022-12-01T10:56:22.6616547Z ) 2022-12-01T10:56:22.6616749Z ) 2022-12-01T10:56:22.6617125Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-12-01T10:56:22.6617417Z ) 2022-12-01T10:56:22.6617615Z ) 2022-12-01T10:56:22.6617913Z (output_proj): Linear(in_features=16, out_features=23, bias=True) 2022-12-01T10:56:22.6618412Z (bn): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 2022-12-01T10:56:22.6618712Z ) 2022-12-01T10:56:22.6618919Z ) 2022-12-01T10:56:22.6619120Z ) 2022-12-01T10:56:22.6619470Z ERROR: expected to be in states [] but current state is TrainingState_.SUMMON_FULL_PARAMS 2022-12-01T10:56:22.6619835Z ok (5.119s) 2022-12-01T10:56:22.6620112Z test_nested_module_apply (__main__.TestApply) 2022-12-01T10:56:22.6620733Z Tests that ``apply()`` modifies parameter values in-place on a ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36796 2022-12-01T10:56:22.6621254Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36797 2022-12-01T10:56:22.6621939Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:22.6622388Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:22.6622949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:22.6623425Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:22.6639261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:22.6639800Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:22.6640427Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:22.6640892Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:22.6641349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:22.6641863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:22.6642872Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:22.6643591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:22.6644112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:22.6644580Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:22.6645806Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:22.6646687Z warnings.warn( 2022-12-01T10:56:22.6647838Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:22.6648590Z warnings.warn( 2022-12-01T10:56:22.6649349Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:22.6649890Z warnings.warn( 2022-12-01T10:56:22.6650624Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:22.6651162Z warnings.warn( 2022-12-01T10:56:22.6651408Z dist init r=0, world=2 2022-12-01T10:56:22.6651641Z dist init r=1, world=2 2022-12-01T10:56:22.6651878Z ok (3.610s) 2022-12-01T10:56:22.6652168Z test_transformer_module_apply (__main__.TestApply) 2022-12-01T10:56:22.6652780Z Tests that ``apply()`` modifies parameter values in-place on an ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36875 2022-12-01T10:56:22.6653317Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36876 2022-12-01T10:56:22.6653925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:22.6654480Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:22.6655043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:22.6655508Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:22.6656082Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:22.6656525Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:22.6657072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:22.6657532Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:22.6657985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:22.6658471Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:22.6659133Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:22.6659817Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:22.6660335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:22.6660787Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:22.6661998Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:22.6662760Z warnings.warn( 2022-12-01T10:56:22.6663933Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:22.6664696Z warnings.warn( 2022-12-01T10:56:22.6665439Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:22.6665975Z warnings.warn( 2022-12-01T10:56:22.6666720Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:22.6667263Z warnings.warn( 2022-12-01T10:56:22.6667495Z dist init r=1, world=2 2022-12-01T10:56:22.6667744Z dist init r=0, world=2 2022-12-01T10:56:22.6667978Z ok (3.810s) 2022-12-01T10:56:22.6668124Z 2022-12-01T10:56:22.6668378Z ---------------------------------------------------------------------- 2022-12-01T10:56:22.6668702Z Ran 3 tests in 12.540s 2022-12-01T10:56:22.6668862Z 2022-12-01T10:56:22.6668955Z OK 2022-12-01T10:56:22.6669088Z 2022-12-01T10:56:22.6669194Z Generating XML reports... 2022-12-01T10:56:22.6669759Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_apply/TEST-TestApply-20221201105609.xml 2022-12-01T10:56:22.6670090Z 2022-12-01T10:56:22.6670436Z ##[endgroup] 2022-12-01T10:56:22.6671007Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_apply (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_apply_6dnkwz8m) 2022-12-01T10:56:22.6671441Z 2022-12-01T10:56:22.6671739Z Running distributed/fsdp/test_distributed_checkpoint ... [2022-12-01 10:56:22.655815] 2022-12-01T10:56:22.6672464Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_distributed_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:22.656114] 2022-12-01T10:56:33.0428167Z 2022-12-01T10:56:33.0428675Z Expand the folded group to see the log file of distributed/fsdp/test_distributed_checkpoint 2022-12-01T10:56:33.0430294Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_distributed_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_distributed_checkpoint_ufh3ilqc) 2022-12-01T10:56:33.0431110Z 2022-12-01T10:56:33.0431357Z Running tests... 2022-12-01T10:56:33.0432242Z ---------------------------------------------------------------------- 2022-12-01T10:56:33.0432851Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_distributed_checkpoint 2022-12-01T10:56:33.0433478Z test_distributed_checkpoint_state_dict_type_StateDictType_LOCAL_STATE_DICT (__main__.TestDistributedCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:56:33.0434034Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36989 2022-12-01T10:56:33.0434508Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36990 2022-12-01T10:56:33.0435109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:33.0435567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:33.0436150Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:33.0436623Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:33.0437189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:33.0437646Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:33.0438470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:33.0438967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:33.0439403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:33.0439901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:33.0440571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:33.0441244Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:33.0441772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:33.0442255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:33.0446131Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:33.0446698Z warnings.warn( 2022-12-01T10:56:33.0447445Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:33.0447993Z warnings.warn( 2022-12-01T10:56:33.0448250Z dist init r=1, world=2 2022-12-01T10:56:33.0448485Z dist init r=0, world=2 2022-12-01T10:56:33.0448724Z ok (5.062s) 2022-12-01T10:56:33.0449248Z test_distributed_checkpoint_state_dict_type_StateDictType_SHARDED_STATE_DICT (__main__.TestDistributedCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37072 2022-12-01T10:56:33.0450022Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37073 2022-12-01T10:56:33.0450621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:33.0451072Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:33.0451650Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:33.0452103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:33.0452677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:33.0453123Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:33.0453698Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:33.0454146Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:33.0454609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:33.0455113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:33.0455776Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:33.0456451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:33.0456973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:33.0457449Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:33.0458398Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:33.0458971Z warnings.warn( 2022-12-01T10:56:33.0459733Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:33.0460275Z warnings.warn( 2022-12-01T10:56:33.0460507Z dist init r=0, world=2 2022-12-01T10:56:33.0460761Z dist init r=1, world=2 2022-12-01T10:56:33.0461003Z ok (3.511s) 2022-12-01T10:56:33.0461154Z 2022-12-01T10:56:33.0461411Z ---------------------------------------------------------------------- 2022-12-01T10:56:33.0461744Z Ran 2 tests in 8.573s 2022-12-01T10:56:33.0461906Z 2022-12-01T10:56:33.0462006Z OK 2022-12-01T10:56:33.0462141Z 2022-12-01T10:56:33.0462268Z Generating XML reports... 2022-12-01T10:56:33.0462914Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_distributed_checkpoint/TEST-TestDistributedCheckpoint-20221201105624.xml 2022-12-01T10:56:33.0463316Z 2022-12-01T10:56:33.0463642Z ##[endgroup] 2022-12-01T10:56:33.0464297Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_distributed_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_distributed_checkpoint_ufh3ilqc) 2022-12-01T10:56:33.0464691Z 2022-12-01T10:56:33.0464978Z Running distributed/fsdp/test_fsdp_multiple_forward ... [2022-12-01 10:56:33.042866] 2022-12-01T10:56:33.0465697Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_multiple_forward.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:33.043131] 2022-12-01T10:56:40.3978513Z 2022-12-01T10:56:40.3979494Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_multiple_forward 2022-12-01T10:56:40.3981039Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_multiple_forward (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_multiple_forward_7g2g6i63) 2022-12-01T10:56:40.3981452Z 2022-12-01T10:56:40.3981572Z Running tests... 2022-12-01T10:56:40.3982113Z ---------------------------------------------------------------------- 2022-12-01T10:56:40.3982713Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_forward 2022-12-01T10:56:40.3983213Z test_multi_forward (__main__.TestMultiForward) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:56:40.3983660Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37190 2022-12-01T10:56:40.3984113Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37191 2022-12-01T10:56:40.3984750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:40.3985188Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:40.3985765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:40.3986238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:40.3986831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:40.3987248Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:40.3987819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:40.3988286Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:40.3988718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:40.3989222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:40.3989879Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:40.3990697Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:40.3991232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:40.3991720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:40.3992189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:56:40.3992677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T10:56:40.3993906Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:40.3994680Z warnings.warn( 2022-12-01T10:56:40.3995794Z /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:1427: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in `device_id` argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with `sync_module_states=True` flag which requires GPU communication. 2022-12-01T10:56:40.3996541Z warnings.warn( 2022-12-01T10:56:40.3997300Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:40.3997916Z warnings.warn( 2022-12-01T10:56:40.3998674Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:40.3999222Z warnings.warn( 2022-12-01T10:56:40.3999991Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:40.4000522Z warnings.warn( 2022-12-01T10:56:40.4001287Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:40.4001841Z warnings.warn( 2022-12-01T10:56:40.4002087Z dist init r=0, world=2 2022-12-01T10:56:40.4002321Z dist init r=1, world=2 2022-12-01T10:56:40.4002851Z ok (5.484s) 2022-12-01T10:56:40.4003003Z 2022-12-01T10:56:40.4003287Z ---------------------------------------------------------------------- 2022-12-01T10:56:40.4003598Z Ran 1 test in 5.484s 2022-12-01T10:56:40.4003757Z 2022-12-01T10:56:40.4003863Z OK 2022-12-01T10:56:40.4003987Z 2022-12-01T10:56:40.4004113Z Generating XML reports... 2022-12-01T10:56:40.4004716Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_forward/TEST-TestMultiForward-20221201105634.xml 2022-12-01T10:56:40.4005089Z 2022-12-01T10:56:40.4005412Z ##[endgroup] 2022-12-01T10:56:40.4006058Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_multiple_forward (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_multiple_forward_7g2g6i63) 2022-12-01T10:56:40.4006450Z 2022-12-01T10:56:40.4006726Z Running distributed/fsdp/test_fsdp_uneven ... [2022-12-01 10:56:40.397886] 2022-12-01T10:56:40.4007517Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_uneven.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:40.398154] 2022-12-01T10:56:47.7346633Z 2022-12-01T10:56:47.7347377Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_uneven 2022-12-01T10:56:47.7348449Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_uneven (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_uneven_r7j6hkdh) 2022-12-01T10:56:47.7348827Z 2022-12-01T10:56:47.7348942Z Running tests... 2022-12-01T10:56:47.7349468Z ---------------------------------------------------------------------- 2022-12-01T10:56:47.7350027Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven 2022-12-01T10:56:47.7350476Z test_one_iteration (__main__.TestUnevenParamShard) 2022-12-01T10:56:47.7350889Z Test FSDP with uneven divide of parameter shards. ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:56:47.7351387Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37308 2022-12-01T10:56:47.7351843Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37309 2022-12-01T10:56:47.7352463Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:47.7352901Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:47.7353477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:47.7353952Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:47.7354509Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:47.7354954Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:47.7355826Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:47.7356294Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:47.7356731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:47.7357229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:47.7357886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:47.7358579Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:47.7359086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:47.7359556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:47.7360439Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:47.7360993Z warnings.warn( 2022-12-01T10:56:47.7361726Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2387: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead. 2022-12-01T10:56:47.7362270Z warnings.warn( 2022-12-01T10:56:47.7363335Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:47.7363887Z warnings.warn( 2022-12-01T10:56:47.7364624Z /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2849: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead. 2022-12-01T10:56:47.7365291Z warnings.warn( 2022-12-01T10:56:47.7365577Z dist init r=0, world=2 2022-12-01T10:56:47.7365831Z dist init r=1, world=2 2022-12-01T10:56:47.7366050Z ok (5.495s) 2022-12-01T10:56:47.7366196Z 2022-12-01T10:56:47.7366473Z ---------------------------------------------------------------------- 2022-12-01T10:56:47.7366799Z Ran 1 test in 5.495s 2022-12-01T10:56:47.7366958Z 2022-12-01T10:56:47.7367033Z OK 2022-12-01T10:56:47.7367166Z 2022-12-01T10:56:47.7367293Z Generating XML reports... 2022-12-01T10:56:47.7367902Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven/TEST-TestUnevenParamShard-20221201105641.xml 2022-12-01T10:56:47.7368270Z 2022-12-01T10:56:47.7368578Z ##[endgroup] 2022-12-01T10:56:47.7369173Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_uneven (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_uneven_r7j6hkdh) 2022-12-01T10:56:47.7369532Z 2022-12-01T10:56:47.7369824Z Running distributed/fsdp/test_checkpoint_wrapper ... [2022-12-01 10:56:47.734660] 2022-12-01T10:56:47.7370525Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_checkpoint_wrapper.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:47.735028] 2022-12-01T10:56:51.6782823Z 2022-12-01T10:56:51.6783556Z Expand the folded group to see the log file of distributed/fsdp/test_checkpoint_wrapper 2022-12-01T10:56:51.6784947Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_checkpoint_wrapper (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_checkpoint_wrapper_y10312z7) 2022-12-01T10:56:51.6785360Z 2022-12-01T10:56:51.6785485Z Running tests... 2022-12-01T10:56:51.6786035Z ---------------------------------------------------------------------- 2022-12-01T10:56:51.6786621Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_checkpoint_wrapper 2022-12-01T10:56:51.6787393Z test_apply_activation_checkpointing (__main__.CheckpointWrapperTest) 2022-12-01T10:56:51.6787848Z Ensures that `apply_activation_checkpointing` can be used ... ok (1.674s) 2022-12-01T10:56:51.6788284Z test_checkpoint_wrapper_cpu_offload (__main__.CheckpointWrapperTest) ... ok (0.410s) 2022-12-01T10:56:51.6788739Z test_checkpoint_wrapper_kwarg_support (__main__.CheckpointWrapperTest) ... ok (0.009s) 2022-12-01T10:56:51.6789175Z test_checkpoint_wrapper_parity (__main__.CheckpointWrapperTest) 2022-12-01T10:56:51.6790291Z Tests that using checkpoint_wrapper or the functional ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/79510 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.002s) 2022-12-01T10:56:51.6791089Z test_forward_missing_attributes (__main__.CheckpointWrapperTest) ... ok (0.001s) 2022-12-01T10:56:51.6791502Z test_fqn (__main__.CheckpointWrapperTest) ... ok (0.001s) 2022-12-01T10:56:51.6791915Z test_load_activation_checkpointed_module (__main__.CheckpointWrapperTest) ... ok (0.003s) 2022-12-01T10:56:51.6792187Z 2022-12-01T10:56:51.6792462Z ---------------------------------------------------------------------- 2022-12-01T10:56:51.6792793Z Ran 7 tests in 2.101s 2022-12-01T10:56:51.6792961Z 2022-12-01T10:56:51.6793048Z OK (skipped=1) 2022-12-01T10:56:51.6793202Z 2022-12-01T10:56:51.6793327Z Generating XML reports... 2022-12-01T10:56:51.6793961Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_checkpoint_wrapper/TEST-CheckpointWrapperTest-20221201105649.xml 2022-12-01T10:56:51.6794341Z 2022-12-01T10:56:51.6794652Z ##[endgroup] 2022-12-01T10:56:51.6795290Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_checkpoint_wrapper (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_checkpoint_wrapper_y10312z7) 2022-12-01T10:56:51.6795676Z 2022-12-01T10:56:51.6795939Z Running distributed/fsdp/test_fsdp_fx ... [2022-12-01 10:56:51.678257] 2022-12-01T10:56:51.6796722Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_fsdp_fx.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:51.678512] 2022-12-01T10:56:58.5136427Z 2022-12-01T10:56:58.5136935Z Expand the folded group to see the log file of distributed/fsdp/test_fsdp_fx 2022-12-01T10:56:58.5137884Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_fsdp_fx (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_fx_mykue_cn) 2022-12-01T10:56:58.5138239Z 2022-12-01T10:56:58.5138348Z Running tests... 2022-12-01T10:56:58.5138841Z ---------------------------------------------------------------------- 2022-12-01T10:56:58.5139392Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_fx 2022-12-01T10:56:58.5139856Z test_symbolic_tracing_outputs (__main__.TestSymbolicTracing) 2022-12-01T10:56:58.5140332Z test ``execution_info.module_forward_order`` and ``execution_info.module_to_execution_infos`` ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:56:58.5140853Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37477 2022-12-01T10:56:58.5141302Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37478 2022-12-01T10:56:58.5141926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:58.5142362Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:58.5142936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:58.5143404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:58.5143982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:56:58.5144707Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:56:58.5145312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:56:58.5145782Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:56:58.5146237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:56:58.5146717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:56:58.5147377Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:58.5148070Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:56:58.5148575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:56:58.5149047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:56:58.5149399Z dist init r=0, world=2 2022-12-01T10:56:58.5149649Z dist init r=1, world=2 2022-12-01T10:56:58.5149869Z ok (4.998s) 2022-12-01T10:56:58.5150017Z 2022-12-01T10:56:58.5150293Z ---------------------------------------------------------------------- 2022-12-01T10:56:58.5150618Z Ran 1 test in 4.999s 2022-12-01T10:56:58.5150776Z 2022-12-01T10:56:58.5150851Z OK 2022-12-01T10:56:58.5150983Z 2022-12-01T10:56:58.5151107Z Generating XML reports... 2022-12-01T10:56:58.5151699Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_fx/TEST-TestSymbolicTracing-20221201105653.xml 2022-12-01T10:56:58.5152052Z 2022-12-01T10:56:58.5152348Z ##[endgroup] 2022-12-01T10:56:58.5152920Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_fx (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_fsdp_fx_mykue_cn) 2022-12-01T10:56:58.5153267Z 2022-12-01T10:56:58.5153563Z Running distributed/_shard/checkpoint/test_checkpoint ... [2022-12-01 10:56:58.513624] 2022-12-01T10:56:58.5154390Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/checkpoint/test_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:56:58.513911] 2022-12-01T10:57:20.3221555Z 2022-12-01T10:57:20.3222085Z Expand the folded group to see the log file of distributed/_shard/checkpoint/test_checkpoint 2022-12-01T10:57:20.3225258Z ##[group]PRINTING LOG FILE of distributed/_shard/checkpoint/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-checkpoint-test_checkpoint_7y56iovo) 2022-12-01T10:57:20.3225693Z 2022-12-01T10:57:20.3225787Z Running tests... 2022-12-01T10:57:20.3227322Z ---------------------------------------------------------------------- 2022-12-01T10:57:20.3228192Z Test results will be stored in test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint 2022-12-01T10:57:20.3228968Z test_default_metadata (__main__.TestDistributedCheckpointing) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:57:20.3229501Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37591 2022-12-01T10:57:20.3232583Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37592 2022-12-01T10:57:20.3233299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3233771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3234376Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3234841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3235426Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3236158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3236752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3237212Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3237660Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3238134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3238635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:57:20.3239117Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:57:20.3239789Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:57:20.3240489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:57:20.3240909Z ok (5.010s) 2022-12-01T10:57:20.3241384Z test_tensor_metadata_with_missing_rank_spec (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37670 2022-12-01T10:57:20.3241968Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37671 2022-12-01T10:57:20.3243033Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3243501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3244069Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3244551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3245128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3245567Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3246272Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3246778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3247222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3247675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3248162Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:57:20.3248662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:57:20.3249324Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:57:20.3250006Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T10:57:20.3250407Z ok (3.409s) 2022-12-01T10:57:20.3250856Z test_dummy_reader_works (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37749 2022-12-01T10:57:20.3251366Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37750 2022-12-01T10:57:20.3251818Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 37751 2022-12-01T10:57:20.3252258Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 37752 2022-12-01T10:57:20.3252867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3253302Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3254016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3254489Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3255055Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3255507Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3256085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3256554Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3257111Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3257558Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3258132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3258598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3259160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3259604Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3260178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3260621Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3261058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:57:20.3261527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:57:20.3261991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3262439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3262820Z skip: Need at least 4 CUDA devices (2.009s) 2022-12-01T10:57:20.3263362Z test_dummy_writer_works (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37885 2022-12-01T10:57:20.3263895Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37886 2022-12-01T10:57:20.3264346Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 37887 2022-12-01T10:57:20.3264785Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 37888 2022-12-01T10:57:20.3265407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3265837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3266414Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3266960Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3267546Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3267997Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3268552Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3269015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3269590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3270034Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3270582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3271127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3271712Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3272157Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3272707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3273173Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3273608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3274060Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:57:20.3274528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3274988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:57:20.3275380Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:57:20.3275846Z test_load_error_handling (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38021 2022-12-01T10:57:20.3276375Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38022 2022-12-01T10:57:20.3276866Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38023 2022-12-01T10:57:20.3277290Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38024 2022-12-01T10:57:20.3277902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3278356Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3278930Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3279379Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3280029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3280500Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3281075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3281520Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3282098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3282744Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3283305Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3283774Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3284362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3284803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3285349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3285811Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3286248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3286701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3287167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:57:20.3287623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:57:20.3288137Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:57:20.3288619Z test_load_error_handling_no_dist (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38157 2022-12-01T10:57:20.3289168Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38158 2022-12-01T10:57:20.3289617Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38159 2022-12-01T10:57:20.3290057Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38160 2022-12-01T10:57:20.3290654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3291103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3291674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3292132Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3292713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3293162Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3293724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3294154Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3294728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3295192Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3295756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3296221Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3296796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3297239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3297869Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3298360Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3298795Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:57:20.3299270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3299717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3300177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:57:20.3300519Z ok (1.909s) 2022-12-01T10:57:20.3300945Z test_save_error_handling (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38293 2022-12-01T10:57:20.3301485Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38294 2022-12-01T10:57:20.3301932Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38295 2022-12-01T10:57:20.3302372Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38296 2022-12-01T10:57:20.3302970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3303422Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3303984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3304408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3304977Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3305522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3306125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3306572Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3307147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3307586Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3308132Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3308591Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3309163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3309606Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3310157Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3310627Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3311068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3311545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3311993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:57:20.3312455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:57:20.3312850Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:57:20.3313324Z test_save_error_handling_no_dist (__main__.TestDistributedFailure) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38429 2022-12-01T10:57:20.3313866Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38430 2022-12-01T10:57:20.3314371Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38431 2022-12-01T10:57:20.3314832Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38432 2022-12-01T10:57:20.3315428Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3315882Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3316446Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3316874Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3317442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3317920Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3318506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3318929Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3319503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3319969Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3320532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3320997Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3321573Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:57:20.3322107Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:57:20.3323101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:57:20.3323578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:57:20.3324014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:57:20.3324481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:57:20.3324925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:57:20.3325383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:57:20.3325724Z ok (1.909s) 2022-12-01T10:57:20.3325873Z 2022-12-01T10:57:20.3326132Z ---------------------------------------------------------------------- 2022-12-01T10:57:20.3326468Z Ran 8 tests in 19.974s 2022-12-01T10:57:20.3326636Z 2022-12-01T10:57:20.3326743Z OK (skipped=4) 2022-12-01T10:57:20.3326896Z 2022-12-01T10:57:20.3327018Z Generating XML reports... 2022-12-01T10:57:20.3327680Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20221201105700.xml 2022-12-01T10:57:20.3328558Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedFailure-20221201105700.xml 2022-12-01T10:57:20.3328941Z 2022-12-01T10:57:20.3329384Z ##[endgroup] 2022-12-01T10:57:20.3330033Z FINISHED PRINTING LOG FILE of distributed/_shard/checkpoint/test_checkpoint (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-checkpoint-test_checkpoint_7y56iovo) 2022-12-01T10:57:20.3330425Z 2022-12-01T10:57:20.3330716Z Running distributed/_shard/checkpoint/test_planner ... [2022-12-01 10:57:20.322345] 2022-12-01T10:57:20.3331430Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/checkpoint/test_planner.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:57:20.322638] 2022-12-01T10:57:23.7760214Z 2022-12-01T10:57:23.7761014Z Expand the folded group to see the log file of distributed/_shard/checkpoint/test_planner 2022-12-01T10:57:23.7762073Z ##[group]PRINTING LOG FILE of distributed/_shard/checkpoint/test_planner (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-checkpoint-test_planner_i3fbr_4u) 2022-12-01T10:57:23.7763089Z 2022-12-01T10:57:23.7763276Z Running tests... 2022-12-01T10:57:23.7763976Z ---------------------------------------------------------------------- 2022-12-01T10:57:23.7764566Z Test results will be stored in test-reports/python-unittest/distributed._shard.checkpoint.test_planner 2022-12-01T10:57:23.7765016Z test_global_plan (__main__.TestSavePlan) ... ok (1.594s) 2022-12-01T10:57:23.7765382Z test_load_with_resharding (__main__.TestSavePlan) ... ok (0.004s) 2022-12-01T10:57:23.7765788Z test_load_with_world_size_diff_by_one (__main__.TestSavePlan) ... ok (0.003s) 2022-12-01T10:57:23.7766197Z test_local_load_plan (__main__.TestSavePlan) ... ok (0.004s) 2022-12-01T10:57:23.7766542Z test_local_plan (__main__.TestSavePlan) ... ok (0.003s) 2022-12-01T10:57:23.7766774Z 2022-12-01T10:57:23.7767268Z ---------------------------------------------------------------------- 2022-12-01T10:57:23.7767657Z Ran 5 tests in 1.610s 2022-12-01T10:57:23.7767822Z 2022-12-01T10:57:23.7767916Z OK 2022-12-01T10:57:23.7768032Z 2022-12-01T10:57:23.7768159Z Generating XML reports... 2022-12-01T10:57:23.7768784Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_planner/TEST-TestSavePlan-20221201105721.xml 2022-12-01T10:57:23.7769155Z 2022-12-01T10:57:23.7769484Z ##[endgroup] 2022-12-01T10:57:23.7770111Z FINISHED PRINTING LOG FILE of distributed/_shard/checkpoint/test_planner (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-checkpoint-test_planner_i3fbr_4u) 2022-12-01T10:57:23.7770667Z 2022-12-01T10:57:23.7770986Z Running distributed/_shard/sharded_tensor/test_sharded_tensor ... [2022-12-01 10:57:23.776049] 2022-12-01T10:57:23.7771733Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:57:23.776504] 2022-12-01T10:59:24.2862583Z 2022-12-01T10:59:24.2863111Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/test_sharded_tensor 2022-12-01T10:59:24.2864989Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/test_sharded_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-test_sharded_tensor_qbnh3ebn) 2022-12-01T10:59:24.2865393Z 2022-12-01T10:59:24.2865507Z Running tests... 2022-12-01T10:59:24.2868259Z ---------------------------------------------------------------------- 2022-12-01T10:59:24.2868947Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor 2022-12-01T10:59:24.2869706Z test_empty (__main__.TestCreateTensorFromParams) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T10:59:24.2870110Z ok (1.578s) 2022-12-01T10:59:24.2870523Z test_local_tensor (__main__.TestLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38634 2022-12-01T10:59:24.2873349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38635 2022-12-01T10:59:24.2874249Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38636 2022-12-01T10:59:24.2875032Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38637 2022-12-01T10:59:24.2876451Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2876931Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2877499Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2877974Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2878811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2879773Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2880975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2881874Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2883399Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2884242Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2885352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2886235Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2887318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2888089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2889151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2890044Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2890841Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2891709Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2892596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2893454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2895128Z skip: Need at least 4 CUDA devices (1.911s) 2022-12-01T10:59:24.2895970Z test_local_tensor_error (__main__.TestLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38770 2022-12-01T10:59:24.2896884Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38771 2022-12-01T10:59:24.2897654Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38772 2022-12-01T10:59:24.2898460Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38773 2022-12-01T10:59:24.2899545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2900364Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2901474Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2902355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2903374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2904206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2905321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2906164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2907215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2908079Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2909264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2909889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2910465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2911028Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2911619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2912064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2912502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2912979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2913447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2913899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2914293Z skip: Need at least 4 CUDA devices (2.009s) 2022-12-01T10:59:24.2914769Z test_collect_local_shard (__main__.TestModuleHookApi) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38906 2022-12-01T10:59:24.2915277Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38907 2022-12-01T10:59:24.2915730Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 38908 2022-12-01T10:59:24.2916167Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 38909 2022-12-01T10:59:24.2916776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2917205Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2917779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2918246Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2918907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2919338Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2919909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2920374Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2920933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2921380Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2921950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2922808Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2923496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2923948Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2924523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2924967Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2925401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2925880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2926347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2926796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2927183Z skip: Need at least 4 CUDA devices (1.809s) 2022-12-01T10:59:24.2927655Z test_reshard_output (__main__.TestModuleHookApi) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39042 2022-12-01T10:59:24.2928283Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39043 2022-12-01T10:59:24.2928737Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39044 2022-12-01T10:59:24.2929172Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39045 2022-12-01T10:59:24.2929784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2930214Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2930784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2931253Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2931830Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2932261Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2932837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2933305Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2933864Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2934305Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2934872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2935332Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2935888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2936446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2937025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2937472Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2937908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2938386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2938855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2939302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2939691Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.2940182Z test_create_shard_with_no_placement (__main__.TestShardMetadata) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39178 2022-12-01T10:59:24.2940730Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39179 2022-12-01T10:59:24.2941163Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39180 2022-12-01T10:59:24.2941601Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39181 2022-12-01T10:59:24.2942214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2942644Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2943216Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2943682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2944255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2944682Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2945316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2945795Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2946359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2946803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2947370Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2947831Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2948385Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2948838Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2949408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2949873Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2950292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2950766Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2951228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2951675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2952066Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.2952543Z test_shard_metadata_init (__main__.TestShardMetadata) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39314 2022-12-01T10:59:24.2953140Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39315 2022-12-01T10:59:24.2953572Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39316 2022-12-01T10:59:24.2954008Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39317 2022-12-01T10:59:24.2954623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2955055Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2955631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2956098Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2956677Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2957103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2957679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2958144Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2958700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2959144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2959709Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2960168Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2960722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2961169Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2961743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2962269Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2963229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2963705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2964173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2964620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2965015Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.2965488Z test_shard_parameter (__main__.TestShardParameter) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39450 2022-12-01T10:59:24.2966009Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39451 2022-12-01T10:59:24.2966444Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39452 2022-12-01T10:59:24.2966881Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39453 2022-12-01T10:59:24.2967501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2967972Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2968548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2969015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2969592Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2970016Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2970724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2971189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2971769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2972194Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2972765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2973228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2973784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2974226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2974796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2975262Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2975683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2976158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2976625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2977072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2977465Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.2977947Z test_shard_parameter_errors (__main__.TestShardParameter) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39586 2022-12-01T10:59:24.2978482Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39587 2022-12-01T10:59:24.2978914Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39588 2022-12-01T10:59:24.2979351Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39589 2022-12-01T10:59:24.2980045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2980496Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2981077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2981544Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2982125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2982551Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2983119Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2983589Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2984169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2984592Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2985165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2985624Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2986177Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2986619Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2987181Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2987716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2988134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.2988610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.2989073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.2989520Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.2989913Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.2990374Z test_shard_tensor (__main__.TestShardTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39722 2022-12-01T10:59:24.2990881Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39723 2022-12-01T10:59:24.2991308Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39724 2022-12-01T10:59:24.2991750Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39725 2022-12-01T10:59:24.2992364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2992818Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2993374Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2993842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2994420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2994843Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2995412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2995877Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2996530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2996971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2997540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.2997998Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.2998548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.2998988Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.2999556Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3000015Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3000437Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3000915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3001380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3001847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3002221Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3003110Z test_shard_tensor_errors (__main__.TestShardTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39858 2022-12-01T10:59:24.3003632Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39859 2022-12-01T10:59:24.3004057Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39860 2022-12-01T10:59:24.3004491Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39861 2022-12-01T10:59:24.3005228Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3005683Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3006238Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3006704Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3007281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3007701Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3008273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3008734Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3009311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3009740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3010311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3010772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3011324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3011767Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3012337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3012796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3013219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3013691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3014243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3014731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3015105Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3015582Z test_cleanup (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39994 2022-12-01T10:59:24.3016110Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39995 2022-12-01T10:59:24.3016537Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 39996 2022-12-01T10:59:24.3016973Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 39997 2022-12-01T10:59:24.3017594Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3018046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3018602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3019072Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3019652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3020078Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3020654Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3021120Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3021697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3022203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3022782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3023244Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3023818Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3024237Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3024806Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3025267Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3025683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3026161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3026632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3027097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3027470Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3027962Z test_complete_world_size (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40130 2022-12-01T10:59:24.3028501Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40131 2022-12-01T10:59:24.3028927Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40132 2022-12-01T10:59:24.3029375Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40133 2022-12-01T10:59:24.3029976Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3030433Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3031054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3031539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3032122Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3032549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3033116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3033577Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3034149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3034577Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3035152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3035611Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3036190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3036613Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3037183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3037647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3038063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3038536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3039069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3039542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3039917Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3040300Z test_create_sharded_tensor_like (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3040842Z Test tensor like methods, i.e. torch.zeros_like(...), torch.full_like, etc. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40266 2022-12-01T10:59:24.3041360Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40267 2022-12-01T10:59:24.3041804Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40268 2022-12-01T10:59:24.3042246Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40269 2022-12-01T10:59:24.3043333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3043769Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3044346Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3044813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3045396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3045825Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3046392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3046856Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3047407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3047860Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3048554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3049037Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3049601Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3050044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3050614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3051057Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3051490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3051971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3052438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3052888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3053276Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3053665Z test_create_sharded_tensor_with_full (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3054139Z Test sharded_tensor.full(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40402 2022-12-01T10:59:24.3054626Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40403 2022-12-01T10:59:24.3055075Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40404 2022-12-01T10:59:24.3055514Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40405 2022-12-01T10:59:24.3056206Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3056662Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3057236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3057707Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3058264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3058711Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3059287Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3059735Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3060312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3060759Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3061326Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3061770Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3062349Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3062790Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3063344Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3063807Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3064242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3064719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3065231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3065707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3066101Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3066469Z test_create_sharded_tensor_with_ones (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3066958Z Test sharded_tensor.ones(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40538 2022-12-01T10:59:24.3067447Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40539 2022-12-01T10:59:24.3067935Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40540 2022-12-01T10:59:24.3068359Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40541 2022-12-01T10:59:24.3068972Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3069425Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3070004Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3070453Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3071029Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3071472Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3072020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3072487Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3073060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3073580Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3074139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3074605Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3075178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3075600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3076171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3076631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3077064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3077523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3077989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3078453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3078827Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3079211Z test_create_sharded_tensor_with_rand (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3079712Z Test sharded_tensor.rand(...)/randn(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40674 2022-12-01T10:59:24.3080212Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40675 2022-12-01T10:59:24.3080641Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40676 2022-12-01T10:59:24.3081077Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40677 2022-12-01T10:59:24.3081688Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3082203Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3083205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3083654Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3084227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3084677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3085265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3085729Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3086316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3086740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3087318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3087782Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3088339Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3088782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3089352Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3089813Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3090228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3090819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3091291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3091758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3092135Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3092517Z test_create_sharded_tensor_with_zeros (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3093012Z Test sharded_tensor.zeros(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40810 2022-12-01T10:59:24.3093480Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40811 2022-12-01T10:59:24.3093925Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40812 2022-12-01T10:59:24.3094363Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40813 2022-12-01T10:59:24.3094979Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3095416Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3095993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3096464Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3097022Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3097468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3098038Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3098501Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3099064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3099597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3100189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3100635Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3101214Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3101658Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3102223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3102666Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3103103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3103578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3104052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3104499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3104885Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3105242Z test_gather_even (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3105742Z Test _sharded_tensor.gather(...) with evenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40946 2022-12-01T10:59:24.3106278Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40947 2022-12-01T10:59:24.3106725Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 40948 2022-12-01T10:59:24.3107168Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 40949 2022-12-01T10:59:24.3107844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3108299Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3108872Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3109321Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3109899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3110347Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3110913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3111359Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3111944Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3112391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3112945Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3113409Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3113986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3114428Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3114975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3115443Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3115881Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3116358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3116883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3117360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3117753Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3118094Z test_gather_uneven (__main__.TestShardedTensorChunked) 2022-12-01T10:59:24.3118614Z Test _sharded_tensor.gather(...) with unevenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41082 2022-12-01T10:59:24.3119153Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41083 2022-12-01T10:59:24.3119600Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41084 2022-12-01T10:59:24.3120028Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41085 2022-12-01T10:59:24.3120637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3121089Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3121642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3122112Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3123172Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3123629Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3124185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3124652Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3125351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3125778Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3126350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3126817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3127392Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3127813Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3128386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3128851Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3129294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3129752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3130218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3130684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3131061Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3131567Z test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41218 2022-12-01T10:59:24.3132117Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41219 2022-12-01T10:59:24.3132563Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41220 2022-12-01T10:59:24.3132980Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41221 2022-12-01T10:59:24.3133587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3134118Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3134696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3135167Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3135742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3136186Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3136739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3137204Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3137786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3138235Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3138786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3139252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3139825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3140244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3140813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3141272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3141707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3142245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3142710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3143178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3143553Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3144050Z test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41354 2022-12-01T10:59:24.3144592Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41355 2022-12-01T10:59:24.3145036Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41356 2022-12-01T10:59:24.3145458Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41357 2022-12-01T10:59:24.3146066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3146517Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3147074Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3147549Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3148130Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3148572Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3149123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3149590Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3150166Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3150614Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3151231Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3151713Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3152296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3152719Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3153288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3153751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3154185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3154641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3155109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3155574Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3155948Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3156439Z test_invalid_sharding (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41490 2022-12-01T10:59:24.3156975Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41491 2022-12-01T10:59:24.3157424Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41492 2022-12-01T10:59:24.3157848Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41493 2022-12-01T10:59:24.3158452Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3158983Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3159567Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3160024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3160603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3161046Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3161596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3162060Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3162819Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3163269Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3163829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3164295Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3164870Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3165295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3165866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3166326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3166762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3167216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3167684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3168278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3168693Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3169179Z test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41626 2022-12-01T10:59:24.3169725Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41627 2022-12-01T10:59:24.3170172Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41628 2022-12-01T10:59:24.3170592Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41629 2022-12-01T10:59:24.3171205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3171664Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3172294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3172746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3173327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3173771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3174324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3174789Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3175367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3175915Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3176480Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3176948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3177523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3177966Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3178516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3178980Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3179416Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3179871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3180337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3180805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3181201Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3181681Z test_multiple_local_shards (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41762 2022-12-01T10:59:24.3182227Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41763 2022-12-01T10:59:24.3182675Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41764 2022-12-01T10:59:24.3183095Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41765 2022-12-01T10:59:24.3183699Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3184148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3184727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3185243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3185838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3186283Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3186834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3187297Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3187873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3188315Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3188867Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3189340Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3189922Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3190363Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3190917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3191381Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3191816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3192272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3192738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3193274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3193677Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3194143Z test_new_group (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41898 2022-12-01T10:59:24.3194673Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41899 2022-12-01T10:59:24.3195119Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 41900 2022-12-01T10:59:24.3195542Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 41901 2022-12-01T10:59:24.3196155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3196605Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3197183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3197633Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3198213Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3198660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3199236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3199683Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3200256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3200699Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3201246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3201715Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3202351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3203047Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3203607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3204069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3204504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3204957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3205425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3205891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3206280Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3206757Z test_partial_world_size (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42034 2022-12-01T10:59:24.3207298Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42035 2022-12-01T10:59:24.3207745Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42036 2022-12-01T10:59:24.3208168Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42037 2022-12-01T10:59:24.3208779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3209228Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3209804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3210369Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3210959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3211406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3211975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3212434Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3212993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3213427Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3213989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3214437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3215008Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3215444Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3216002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3216445Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3216878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3217345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3217806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3218249Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3218632Z skip: Need at least 4 CUDA devices (1.912s) 2022-12-01T10:59:24.3219206Z test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42170 2022-12-01T10:59:24.3219757Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42171 2022-12-01T10:59:24.3220201Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42172 2022-12-01T10:59:24.3220636Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42173 2022-12-01T10:59:24.3221244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3221671Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3222239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3222708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3223269Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3223711Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3224271Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3224728Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3225276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3225718Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3226281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3226820Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3227377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3227820Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3228379Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3228818Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3229252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3229721Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3230181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3230628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3231012Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3231503Z test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42306 2022-12-01T10:59:24.3232021Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42307 2022-12-01T10:59:24.3232460Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42308 2022-12-01T10:59:24.3232889Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42309 2022-12-01T10:59:24.3233495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3233922Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3234489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3234951Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3235511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3236048Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3236637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3237097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3237648Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3238090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3238659Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3239115Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3239674Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3240114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3240680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3241122Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3241551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3242017Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3242640Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3243095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3243481Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3244071Z test_sharding_columns (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42442 2022-12-01T10:59:24.3244588Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42443 2022-12-01T10:59:24.3245032Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42444 2022-12-01T10:59:24.3245462Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42445 2022-12-01T10:59:24.3246078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3246508Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3247078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3247539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3248120Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3248544Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3249112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3249573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3250129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3250570Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3251133Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3251592Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3252151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3252602Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3253241Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3253708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3254139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3254603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3255065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3255509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3255887Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3256360Z test_state_dict (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42578 2022-12-01T10:59:24.3256895Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42579 2022-12-01T10:59:24.3257325Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42580 2022-12-01T10:59:24.3257758Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42581 2022-12-01T10:59:24.3258365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3258793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3259362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3259822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3260396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3260898Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3261475Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3261934Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3262487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3262923Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3263489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3263944Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3264500Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3264941Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3265514Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3265957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3266389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3266859Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3267320Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3267766Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3268192Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3268683Z test_state_dict_new_group (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42714 2022-12-01T10:59:24.3269222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42715 2022-12-01T10:59:24.3269710Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42716 2022-12-01T10:59:24.3270169Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42717 2022-12-01T10:59:24.3270780Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3271206Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3271775Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3272241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3272821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3273251Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3273810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3274253Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3274802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3275270Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3275850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3276307Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3276861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3277382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3277950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3278412Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3278828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3279293Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3279755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3280196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3280575Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3281076Z test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42850 2022-12-01T10:59:24.3281627Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42851 2022-12-01T10:59:24.3282058Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42852 2022-12-01T10:59:24.3282647Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42853 2022-12-01T10:59:24.3283265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3283692Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3284261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3285051Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3285632Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3286060Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3286635Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3287188Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3287786Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3288207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3288769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3289226Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3289776Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3290213Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3290781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3291241Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3291656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3292123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3292584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3293024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3293412Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3293891Z test_custom_op (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42986 2022-12-01T10:59:24.3294416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42987 2022-12-01T10:59:24.3294949Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 42988 2022-12-01T10:59:24.3295384Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 42989 2022-12-01T10:59:24.3295993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3296438Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3296982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3297423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3297989Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3298437Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3299023Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3299485Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3300060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3300486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3301043Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3301484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3302034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3302498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3303078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3303541Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3304020Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3304504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3304970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3305430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3305801Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3306288Z test_custom_op_errors (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43122 2022-12-01T10:59:24.3306826Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43123 2022-12-01T10:59:24.3307255Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43124 2022-12-01T10:59:24.3307683Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43125 2022-12-01T10:59:24.3308294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3308738Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3309290Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3309754Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3310325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3310747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3311309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3311882Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3312464Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3312885Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3313443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3313900Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3314471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3314892Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3315456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3315917Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3316333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3316798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3317261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3317721Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3318093Z skip: Need at least 4 CUDA devices (2.009s) 2022-12-01T10:59:24.3318586Z test_custom_op_override (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43258 2022-12-01T10:59:24.3319125Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43259 2022-12-01T10:59:24.3319549Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43260 2022-12-01T10:59:24.3319992Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43261 2022-12-01T10:59:24.3320651Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3321115Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3321673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3322136Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3322895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3323320Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3323891Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3324350Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3324924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3325346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3325911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3326367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3326934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3327353Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3327917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3328486Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3328902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3329376Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3329827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3330287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3330660Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3331045Z test_create_sharded_tensor_with_ones (__main__.TestShardedTensorEnumerable) 2022-12-01T10:59:24.3331542Z Test sharded_tensor.ones(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43394 2022-12-01T10:59:24.3332010Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43395 2022-12-01T10:59:24.3332452Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43396 2022-12-01T10:59:24.3332883Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43397 2022-12-01T10:59:24.3333497Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3333926Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3334495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3334958Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3335517Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3335957Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3336522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3336988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3337621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3338086Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3338655Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3339114Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3339667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3340106Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3340668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3341110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3341541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3342011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3342472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3342918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3343300Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3343662Z test_gather_even (__main__.TestShardedTensorEnumerable) 2022-12-01T10:59:24.3344157Z Test _sharded_tensor.gather(...) with evenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43530 2022-12-01T10:59:24.3344686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43531 2022-12-01T10:59:24.3345211Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43532 2022-12-01T10:59:24.3345648Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43533 2022-12-01T10:59:24.3346245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3346686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3347256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3347703Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3348274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3348712Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3349273Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3349726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3350299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3350740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3351302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3351746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3352317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3352754Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3353296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3353756Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3354184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3354730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3355199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3355651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3356038Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3356382Z test_gather_uneven (__main__.TestShardedTensorEnumerable) 2022-12-01T10:59:24.3356906Z Test _sharded_tensor.gather(...) with unevenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43666 2022-12-01T10:59:24.3357436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43667 2022-12-01T10:59:24.3357880Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43668 2022-12-01T10:59:24.3358304Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43669 2022-12-01T10:59:24.3358915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3359361Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3359929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3360374Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3360942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3361382Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3361932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3362700Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3363303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3363743Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3364294Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3364753Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3365321Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3365740Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3366303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3366765Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3367200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3367654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3385222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3385722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3386124Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3386605Z test_grid_sharding (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43802 2022-12-01T10:59:24.3387150Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43803 2022-12-01T10:59:24.3387596Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43804 2022-12-01T10:59:24.3388035Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43805 2022-12-01T10:59:24.3388837Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3389328Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3389918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3390372Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3390951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3391396Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3391946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3392421Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3393000Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3393447Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3393993Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3394461Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3395034Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3395473Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3396021Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3396594Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3397034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3397496Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3397957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3398423Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3398808Z skip: Need at least 4 CUDA devices (1.911s) 2022-12-01T10:59:24.3399290Z test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43938 2022-12-01T10:59:24.3399839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43939 2022-12-01T10:59:24.3400286Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 43940 2022-12-01T10:59:24.3400720Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 43941 2022-12-01T10:59:24.3401341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3401793Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3402365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3403122Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3403708Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3404151Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3404720Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3405172Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3405737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3406275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3406844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3407292Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3407866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3408334Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3408902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3409365Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3409804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3410261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3410723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3411181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3411568Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3412037Z test_new_group (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44074 2022-12-01T10:59:24.3412565Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44075 2022-12-01T10:59:24.3413010Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44076 2022-12-01T10:59:24.3413436Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44077 2022-12-01T10:59:24.3414153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3414608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3415185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3415637Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3416002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3416177Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3416548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3416737Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3417102Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3417275Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3417647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3417817Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3418185Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3418355Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3418726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3418914Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3419145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3419377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3419660Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3419883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3420037Z skip: Need at least 4 CUDA devices (2.010s) 2022-12-01T10:59:24.3420363Z test_partial_world_size (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44210 2022-12-01T10:59:24.3420581Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44211 2022-12-01T10:59:24.3420797Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44212 2022-12-01T10:59:24.3421014Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44213 2022-12-01T10:59:24.3421396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3421572Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3421954Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3422127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3422502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3422674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3423049Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3423238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3423602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3423853Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3424236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3424404Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3424772Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3424946Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3425315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3425498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3425725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3425956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3426179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3426403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3426535Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3426867Z test_sharded_tensor_device (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44346 2022-12-01T10:59:24.3427085Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44347 2022-12-01T10:59:24.3427299Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44348 2022-12-01T10:59:24.3427513Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44349 2022-12-01T10:59:24.3427888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3428068Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3428506Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3428691Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3429063Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3429239Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3429612Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3429801Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3430162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3430339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3430714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3430901Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3431250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3431420Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3431788Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3431972Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3432200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3432492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3432712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3432940Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3433070Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3433405Z test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44482 2022-12-01T10:59:24.3433622Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44483 2022-12-01T10:59:24.3433835Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44484 2022-12-01T10:59:24.3434048Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44485 2022-12-01T10:59:24.3434418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3434596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3434967Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3435142Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3435501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3435692Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3436066Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3436252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3436615Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3436791Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3437158Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3437391Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3437766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3437957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3438337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3438528Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3438757Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3438985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3439208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3439433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3439581Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3439894Z test_sharded_tensor_to_cpu (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44618 2022-12-01T10:59:24.3440111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44619 2022-12-01T10:59:24.3440325Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44620 2022-12-01T10:59:24.3440539Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44621 2022-12-01T10:59:24.3440911Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3441155Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3441538Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3441731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3442075Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3442248Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3442862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3443056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3443423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3443595Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3443971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3444163Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3444535Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3444688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3445056Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3445243Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3445476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3445704Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3445927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3446152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3446408Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3446740Z test_sharded_tensor_to_cuda (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44754 2022-12-01T10:59:24.3446960Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44755 2022-12-01T10:59:24.3447177Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44756 2022-12-01T10:59:24.3447390Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44757 2022-12-01T10:59:24.3447769Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3447945Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3448327Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3448519Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3448884Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3449037Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3449407Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3449597Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3449959Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3450129Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3450596Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3450785Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3451155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3451311Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3451689Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3451876Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3452103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3452330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3452550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3452782Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3452930Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3453264Z test_sharded_tensor_to_test (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44890 2022-12-01T10:59:24.3453467Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44891 2022-12-01T10:59:24.3453682Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 44892 2022-12-01T10:59:24.3453894Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 44893 2022-12-01T10:59:24.3454270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3454446Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3454821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3455016Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3455440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3455608Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3455988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3456179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3456540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3456710Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3457079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3457268Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3457640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3457810Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3458161Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3458345Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3458573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3458800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3459023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3459324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3459474Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3459796Z test_uneven_shards (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45026 2022-12-01T10:59:24.3459997Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45027 2022-12-01T10:59:24.3460211Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45028 2022-12-01T10:59:24.3460424Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45029 2022-12-01T10:59:24.3460797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3460971Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3461334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3461514Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3461898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3462091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3462450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3462638Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3463001Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3463172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3463540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3463731Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3464155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3464343Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3464700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3464888Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3465117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3465344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3465562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3465785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3465938Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3466262Z test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45162 2022-12-01T10:59:24.3466478Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45163 2022-12-01T10:59:24.3466674Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45164 2022-12-01T10:59:24.3466889Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45165 2022-12-01T10:59:24.3467261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3467435Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3467813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3468133Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3468504Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3468686Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3469039Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3469231Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3469589Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3469759Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3470126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3470311Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3470680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3470852Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3471222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3471390Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3471618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3471843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3472063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3472332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3472491Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3473495Z test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/78068 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.002s) 2022-12-01T10:59:24.3473883Z test_init_from_local_shards_and_global_metadata (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45298 2022-12-01T10:59:24.3474104Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45299 2022-12-01T10:59:24.3474320Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45300 2022-12-01T10:59:24.3474517Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45301 2022-12-01T10:59:24.3474896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3475073Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3475456Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3475648Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3476010Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3476181Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3476555Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3476726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3477086Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3477327Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3477703Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3477876Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3478253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3478443Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3478817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3479006Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3479219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3479450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3479678Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3479901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3480052Z skip: Need at least 4 CUDA devices (1.911s) 2022-12-01T10:59:24.3480442Z test_init_from_local_shards_and_global_metadata_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45434 2022-12-01T10:59:24.3480661Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45435 2022-12-01T10:59:24.3480876Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45436 2022-12-01T10:59:24.3481073Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45437 2022-12-01T10:59:24.3481449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3481624Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3482061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3482265Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3482879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3483057Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3483434Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3483625Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3483961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3484140Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3484513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3484701Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3485072Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3485246Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3485614Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3485800Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3486010Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3486342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3486564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3486790Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3486940Z skip: Need at least 4 CUDA devices (1.911s) 2022-12-01T10:59:24.3487311Z test_init_from_local_shards_invalid_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45570 2022-12-01T10:59:24.3487529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45571 2022-12-01T10:59:24.3487746Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45572 2022-12-01T10:59:24.3487955Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45573 2022-12-01T10:59:24.3488313Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3488494Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3488878Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3489069Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3489432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3489601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3489974Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3490162Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3490520Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3490678Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3491123Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3491330Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3491707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3491878Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3492246Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3492432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3492657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3492872Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3493091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3493318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3493467Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3493836Z test_init_from_local_shards_invalid_pin_memory (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45706 2022-12-01T10:59:24.3494054Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45707 2022-12-01T10:59:24.3494270Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45708 2022-12-01T10:59:24.3494482Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45709 2022-12-01T10:59:24.3494856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3495082Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3495473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3495663Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3496027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3496198Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3496571Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3496762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3497121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3497278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3497649Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3497838Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3498207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3498377Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3498744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3498929Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3499155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3499380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3499585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3499920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3500184Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T10:59:24.3500431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T10:59:24.3500671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-12-01T10:59:24.3501079Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T10:59:24.3501319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-12-01T10:59:24.3501718Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T10:59:24.3502127Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T10:59:24.3502501Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T10:59:24.3502655Z skip: Need at least 4 CUDA devices (2.009s) 2022-12-01T10:59:24.3503031Z test_init_from_local_shards_invalid_property_cross_ranks (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45854 2022-12-01T10:59:24.3503243Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45855 2022-12-01T10:59:24.3503448Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45856 2022-12-01T10:59:24.3503655Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45857 2022-12-01T10:59:24.3504116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3504293Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3504660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3504853Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3505217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3505380Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3505747Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3505937Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3506296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3506471Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3506841Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3507011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3507378Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3507549Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3507917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3508104Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3508333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3508563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3508860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3509103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3509236Z skip: Need at least 4 CUDA devices (2.010s) 2022-12-01T10:59:24.3509608Z test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45990 2022-12-01T10:59:24.3509827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45991 2022-12-01T10:59:24.3510043Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 45992 2022-12-01T10:59:24.3510256Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 45993 2022-12-01T10:59:24.3510641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3510819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3511199Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3511371Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3511739Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3511911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3512281Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3512471Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3512830Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3513071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3513454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3513641Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3513987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3514158Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3514522Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3514708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3514937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3515169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3515389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3515616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3515748Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3516120Z test_init_from_local_shards_invalid_shards_overlap (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46126 2022-12-01T10:59:24.3516341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46127 2022-12-01T10:59:24.3516556Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46128 2022-12-01T10:59:24.3516766Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46129 2022-12-01T10:59:24.3517140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3517318Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3517754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3517957Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3518309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3518486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3518859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3519046Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3519408Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3519582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3519951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3520139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3520487Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3520657Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3521025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3521211Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3521437Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3521733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3521954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3522182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3522333Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3522910Z test_init_from_local_shards_new_group (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46262 2022-12-01T10:59:24.3523137Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46263 2022-12-01T10:59:24.3523353Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46264 2022-12-01T10:59:24.3523568Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46265 2022-12-01T10:59:24.3523952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3524134Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3524515Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3524709Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3525057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3525229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3525602Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3525791Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3526152Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3526326Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3526782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3526992Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3527367Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3527519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3527889Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3528073Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3528303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3528535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3528758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3528985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3529135Z skip: Need at least 4 CUDA devices (2.009s) 2022-12-01T10:59:24.3529451Z test_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46398 2022-12-01T10:59:24.3529667Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46399 2022-12-01T10:59:24.3529881Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46400 2022-12-01T10:59:24.3530092Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46401 2022-12-01T10:59:24.3530465Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3530731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3531115Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3531310Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3531675Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3531827Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3532198Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3532388Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3532750Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3532927Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3533307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3533499Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3533861Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3534014Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3534388Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3534573Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3534799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3535023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3535244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3535469Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3535677Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3536076Z test_st_base_init_from_local_shards_and_global_metadata (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46534 2022-12-01T10:59:24.3536278Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46535 2022-12-01T10:59:24.3536493Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46536 2022-12-01T10:59:24.3536706Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46537 2022-12-01T10:59:24.3537078Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3537260Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3537634Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3537826Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3538189Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3538342Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3538713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3538901Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3539259Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3539430Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3539873Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3540062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3540429Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3540601Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3540946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3541135Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3541363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3541588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3541811Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3542035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3542185Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3542530Z test_init_from_local_tensor (__main__.TestShardedTensorFromLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46670 2022-12-01T10:59:24.3542729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46671 2022-12-01T10:59:24.3542944Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46672 2022-12-01T10:59:24.3543155Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46673 2022-12-01T10:59:24.3543527Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3543702Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3544084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3544335Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3544719Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3544894Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3545249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3545438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3545797Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3545968Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3546336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3546526Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3546898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3547071Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3547418Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3547606Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3547836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3548062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3548278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3548567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3548719Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T10:59:24.3549076Z test_init_from_local_tensor_errors (__main__.TestShardedTensorFromLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46806 2022-12-01T10:59:24.3549295Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46807 2022-12-01T10:59:24.3549491Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 46808 2022-12-01T10:59:24.3549706Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 46809 2022-12-01T10:59:24.3550085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3550259Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3550623Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3550795Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3551170Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3551343Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3551700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3551890Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3552264Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3552449Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3552821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3553009Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3553435Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T10:59:24.3553621Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T10:59:24.3553996Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T10:59:24.3554164Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T10:59:24.3554392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T10:59:24.3554620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T10:59:24.3554844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T10:59:24.3555068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T10:59:24.3555216Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T10:59:24.3555441Z test_serialize_and_deserialize (__main__.TestShardedTensorMetadata) ... ok (0.054s) 2022-12-01T10:59:24.3555465Z 2022-12-01T10:59:24.3555736Z ---------------------------------------------------------------------- 2022-12-01T10:59:24.3555837Z Ran 64 tests in 118.628s 2022-12-01T10:59:24.3555876Z 2022-12-01T10:59:24.3555967Z OK (skipped=62) 2022-12-01T10:59:24.3555986Z 2022-12-01T10:59:24.3556113Z Generating XML reports... 2022-12-01T10:59:24.3556632Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestCreateTensorFromParams-20221201105725.xml 2022-12-01T10:59:24.3557135Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorMetadata-20221201105725.xml 2022-12-01T10:59:24.3557671Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestLocalTensor-20221201105725.xml 2022-12-01T10:59:24.3558145Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestModuleHookApi-20221201105725.xml 2022-12-01T10:59:24.3558602Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardMetadata-20221201105725.xml 2022-12-01T10:59:24.3559070Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardParameter-20221201105725.xml 2022-12-01T10:59:24.3559521Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardTensor-20221201105725.xml 2022-12-01T10:59:24.3560007Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorChunked-20221201105725.xml 2022-12-01T10:59:24.3560500Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorCustomOps-20221201105725.xml 2022-12-01T10:59:24.3561011Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorEnumerable-20221201105725.xml 2022-12-01T10:59:24.3561539Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalShards-20221201105725.xml 2022-12-01T10:59:24.3562067Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalTensor-20221201105725.xml 2022-12-01T10:59:24.3562087Z 2022-12-01T10:59:24.3562862Z ##[endgroup] 2022-12-01T10:59:24.3563429Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/test_sharded_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-test_sharded_tensor_qbnh3ebn) 2022-12-01T10:59:24.3563456Z 2022-12-01T10:59:24.3563714Z Running distributed/test_c10d_pypg ... [2022-12-01 10:59:24.287477] 2022-12-01T10:59:24.3564276Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_pypg.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 10:59:24.287733] 2022-12-01T11:01:46.0045250Z 2022-12-01T11:01:46.0045681Z Expand the folded group to see the log file of distributed/test_c10d_pypg 2022-12-01T11:01:46.0048587Z ##[group]PRINTING LOG FILE of distributed/test_c10d_pypg (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_pypg_c1izh2lc) 2022-12-01T11:01:46.0049050Z 2022-12-01T11:01:46.0049225Z Running tests... 2022-12-01T11:01:46.0050234Z ---------------------------------------------------------------------- 2022-12-01T11:01:46.0050797Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_pypg 2022-12-01T11:01:46.0051278Z test_ddp_checkpointing_dynamic_module (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0052379Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:01:46.0052901Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46977 2022-12-01T11:01:46.0053543Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0053980Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0054566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0055042Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0055487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0055970Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_1jviw5x 2022-12-01T11:01:46.0056817Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_1jviw5x/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0060005Z ok (5.546s) 2022-12-01T11:01:46.0060481Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0061079Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47014 2022-12-01T11:01:46.0061854Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0062296Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0062877Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0063376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0063821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0064312Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsaav3z6m 2022-12-01T11:01:46.0064869Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsaav3z6m/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0065259Z ok (4.009s) 2022-12-01T11:01:46.0069972Z test_ddp_checkpointing_once_use_reentrant_False (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0071442Z DDP works as expected when layer is checkpointed only once. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47051 2022-12-01T11:01:46.0072196Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0072674Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0073289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0073755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0074419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0075303Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_4te9gsy 2022-12-01T11:01:46.0076330Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_4te9gsy/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0077303Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0078196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0080439Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0082020Z warnings.warn( 2022-12-01T11:01:46.0083182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0083678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0084162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0084648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0084980Z ok (4.110s) 2022-12-01T11:01:46.0085373Z test_ddp_checkpointing_once_use_reentrant_True (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0086145Z DDP works as expected when layer is checkpointed only once. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47088 2022-12-01T11:01:46.0086866Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0087471Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0088062Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0088542Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0089162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0089678Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphwq3qdfr 2022-12-01T11:01:46.0090227Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphwq3qdfr/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0090752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0091224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0092410Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0093149Z warnings.warn( 2022-12-01T11:01:46.0093530Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0093998Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0094480Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0094963Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0095317Z ok (4.210s) 2022-12-01T11:01:46.0095682Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0096495Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47125 2022-12-01T11:01:46.0097223Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0097660Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0098237Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0098708Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0099147Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0099629Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp49b4znlr 2022-12-01T11:01:46.0100173Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp49b4znlr/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0100696Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0101187Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0101530Z ok (4.110s) 2022-12-01T11:01:46.0101911Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0102626Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47162 2022-12-01T11:01:46.0103306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0103763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0104343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0104902Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0105327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0105838Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm1atqsw9 2022-12-01T11:01:46.0106390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm1atqsw9/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0106896Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0107390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0107748Z ok (4.109s) 2022-12-01T11:01:46.0108111Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0108817Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47199 2022-12-01T11:01:46.0109536Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0109993Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0110551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0111026Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0111465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0111976Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf7crhwit 2022-12-01T11:01:46.0112502Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf7crhwit/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0113026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0114132Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T11:01:46.0115153Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0115509Z ok (4.110s) 2022-12-01T11:01:46.0115852Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0116573Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47236 2022-12-01T11:01:46.0117288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0117741Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0118303Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0118780Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0119225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0119720Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_r_c2a7m 2022-12-01T11:01:46.0120245Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_r_c2a7m/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0120629Z ok (4.110s) 2022-12-01T11:01:46.0120984Z test_ddp_checkpointing_twice_weight_sharing (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0121605Z Checkpointing should work with static graph in the case of checkpointing ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47273 2022-12-01T11:01:46.0122322Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0123349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0123942Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0124397Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0124836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0125342Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptdi584nl 2022-12-01T11:01:46.0125887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptdi584nl/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0126394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0126883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0127244Z ok (4.110s) 2022-12-01T11:01:46.0127605Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0128189Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47310 2022-12-01T11:01:46.0128898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0129356Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0129913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0130386Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0130831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0131423Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpczqe6pn7 2022-12-01T11:01:46.0132000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpczqe6pn7/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0133066Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T11:01:46.0134756Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0135487Z warnings.warn( 2022-12-01T11:01:46.0135871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0136344Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0136699Z ok (4.110s) 2022-12-01T11:01:46.0137070Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0137636Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47347 2022-12-01T11:01:46.0138445Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0138902Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0139489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0139943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0140390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0140899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxnuc1t14 2022-12-01T11:01:46.0141429Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxnuc1t14/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0142632Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0143367Z warnings.warn( 2022-12-01T11:01:46.0143751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0144245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0144582Z ok (4.110s) 2022-12-01T11:01:46.0144958Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0145518Z Test that checkpointing with weight sharing works. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47384 2022-12-01T11:01:46.0146183Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0146639Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0147227Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0147755Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0148199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0148704Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2t1kmb5j 2022-12-01T11:01:46.0149253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2t1kmb5j/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0149756Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0150245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0150601Z ok (4.010s) 2022-12-01T11:01:46.0150975Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.TestDDPWithWorkSubclass) 2022-12-01T11:01:46.0151521Z Test that checkpointing with weight sharing works. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47421 2022-12-01T11:01:46.0152210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0152666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0153225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0153701Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0154144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0154653Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg3te9ibm 2022-12-01T11:01:46.0155178Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg3te9ibm/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0155778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0156271Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0156760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0157224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0157620Z ok (4.111s) 2022-12-01T11:01:46.0158073Z test_ddp_invoke_work_object (__main__.TestDDPWithWorkSubclass) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47458 2022-12-01T11:01:46.0158762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0159211Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0159795Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0160272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0160699Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0161204Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqwo4yul3 2022-12-01T11:01:46.0161752Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqwo4yul3/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0162120Z ok (2.206s) 2022-12-01T11:01:46.0163051Z test_ddp_with_pypg (__main__.TestDDPWithWorkSubclass) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47494 2022-12-01T11:01:46.0163766Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0164223Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0164782Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0165350Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0165813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0166301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph9ey9z71 2022-12-01T11:01:46.0166848Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph9ey9z71/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0167370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0167730Z ok (2.129s) 2022-12-01T11:01:46.0168168Z test_ddp_with_pypg_with_grad_views (__main__.TestDDPWithWorkSubclass) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47530 2022-12-01T11:01:46.0168877Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0169339Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0169924Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0170380Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0170820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0171324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpghx7tyu9 2022-12-01T11:01:46.0171854Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpghx7tyu9/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0172380Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0172734Z ok (2.106s) 2022-12-01T11:01:46.0173182Z test_invalid_powerSGD_state (__main__.TestDDPWithWorkSubclass) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47566 2022-12-01T11:01:46.0173988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0174441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0175014Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0175469Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0175910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0176709Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0177805Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0178888Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0179963Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0181094Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0182164Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0182798Z ok (2.106s) 2022-12-01T11:01:46.0183251Z test_sync_batch_norm_empty_input (__main__.TestDDPWithWorkSubclass) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47600 2022-12-01T11:01:46.0183971Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0184408Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0184987Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0185458Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0185898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0186383Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpumgwo98n 2022-12-01T11:01:46.0186934Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpumgwo98n/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0187457Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0187883Z ok (3.609s) 2022-12-01T11:01:46.0188327Z test_sync_batch_norm_only_empty_input (__main__.TestDDPWithWorkSubclass) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47637 2022-12-01T11:01:46.0189037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0189493Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0190058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0190522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0190964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0191474Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgozw4ktn 2022-12-01T11:01:46.0192004Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgozw4ktn/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0192528Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0192892Z ok (3.509s) 2022-12-01T11:01:46.0193218Z test_ddp_checkpointing_dynamic_module (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0193917Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47674 2022-12-01T11:01:46.0194621Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0195074Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0195631Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0196096Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0196539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0197099Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbq16bm3c 2022-12-01T11:01:46.0197645Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbq16bm3c/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0198030Z ok (4.110s) 2022-12-01T11:01:46.0198389Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0198942Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47711 2022-12-01T11:01:46.0199644Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0200096Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0200673Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0201137Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0201579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0202083Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpodwrdhr5 2022-12-01T11:01:46.0203142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpodwrdhr5/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0203540Z ok (4.010s) 2022-12-01T11:01:46.0203901Z test_ddp_checkpointing_once_use_reentrant_False (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0204458Z DDP works as expected when layer is checkpointed only once. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47748 2022-12-01T11:01:46.0205139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0205708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0206293Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0206751Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0207196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0207699Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplot9w7w6 2022-12-01T11:01:46.0208240Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplot9w7w6/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0208741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0209229Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0210402Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0211140Z warnings.warn( 2022-12-01T11:01:46.0211501Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0211983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0212466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0212950Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0213284Z ok (4.110s) 2022-12-01T11:01:46.0213637Z test_ddp_checkpointing_once_use_reentrant_True (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0214200Z DDP works as expected when layer is checkpointed only once. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47785 2022-12-01T11:01:46.0214953Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0215424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0216009Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0216482Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0216900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0217407Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg4ftr4zn 2022-12-01T11:01:46.0217955Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg4ftr4zn/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0218461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0218959Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0220134Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0220868Z warnings.warn( 2022-12-01T11:01:46.0221246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0221710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0222188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0222742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0223073Z ok (4.210s) 2022-12-01T11:01:46.0223462Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0224178Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47822 2022-12-01T11:01:46.0224882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0225322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0225897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0226370Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0226793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0227304Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn2h9386j 2022-12-01T11:01:46.0227853Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn2h9386j/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0228369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0228834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0229183Z ok (4.110s) 2022-12-01T11:01:46.0229564Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0230253Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47859 2022-12-01T11:01:46.0230951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0231406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0232036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0232504Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0232943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0233447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplrppv7ga 2022-12-01T11:01:46.0233999Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplrppv7ga/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0234499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0234984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0235340Z ok (4.110s) 2022-12-01T11:01:46.0235690Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0236413Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47896 2022-12-01T11:01:46.0237125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0237582Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0238141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0238613Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0239054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0239540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzrqhjdcy 2022-12-01T11:01:46.0240162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzrqhjdcy/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0240694Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0241732Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T11:01:46.0243206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0243549Z ok (4.110s) 2022-12-01T11:01:46.0243909Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0244639Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47933 2022-12-01T11:01:46.0245350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0245787Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0246364Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0246839Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0247259Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0247767Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp65svzwze 2022-12-01T11:01:46.0248309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp65svzwze/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0248696Z ok (4.110s) 2022-12-01T11:01:46.0249033Z test_ddp_checkpointing_twice_weight_sharing (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0249697Z Checkpointing should work with static graph in the case of checkpointing ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47970 2022-12-01T11:01:46.0250431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0250864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0251439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0251905Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0252347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0252839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiok7pbvg 2022-12-01T11:01:46.0253388Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiok7pbvg/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0253910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0254401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0254736Z ok (4.110s) 2022-12-01T11:01:46.0255109Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0255691Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48007 2022-12-01T11:01:46.0256380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0256832Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0257619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0258103Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0258525Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0259034Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnwly8nzf 2022-12-01T11:01:46.0259579Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnwly8nzf/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0260641Z [W reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-12-01T11:01:46.0262312Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0263026Z warnings.warn( 2022-12-01T11:01:46.0263406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0263898Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0264254Z ok (4.110s) 2022-12-01T11:01:46.0264602Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0265187Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48044 2022-12-01T11:01:46.0265947Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0266395Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0266980Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0267456Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0267897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0268382Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5b3vckke 2022-12-01T11:01:46.0268927Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5b3vckke/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0270131Z /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py:1772: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-12-01T11:01:46.0270863Z warnings.warn( 2022-12-01T11:01:46.0271222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0271713Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0272068Z ok (4.110s) 2022-12-01T11:01:46.0272427Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0272983Z Test that checkpointing with weight sharing works. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48081 2022-12-01T11:01:46.0273742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0274199Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0274758Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0275231Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0275671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0276178Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb0p3od58 2022-12-01T11:01:46.0276709Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb0p3od58/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0277230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0277722Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0278056Z ok (4.110s) 2022-12-01T11:01:46.0278430Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.TestDDPWithWorkWrapper) 2022-12-01T11:01:46.0278985Z Test that checkpointing with weight sharing works. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48118 2022-12-01T11:01:46.0279672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0280109Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0280683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0281152Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0281575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0282087Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3zxcdpqv 2022-12-01T11:01:46.0283215Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3zxcdpqv/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0283770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0284239Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0284723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0285204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0285556Z ok (4.110s) 2022-12-01T11:01:46.0285988Z test_ddp_invoke_work_object (__main__.TestDDPWithWorkWrapper) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48155 2022-12-01T11:01:46.0286701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0287161Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0287722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0288193Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0288638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0289143Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3xxbvfny 2022-12-01T11:01:46.0289674Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3xxbvfny/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0290058Z ok (2.106s) 2022-12-01T11:01:46.0290494Z test_ddp_with_pypg (__main__.TestDDPWithWorkWrapper) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48191 2022-12-01T11:01:46.0291274Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0291731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0292309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0292778Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0293196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0293698Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7_4w51y6 2022-12-01T11:01:46.0294235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7_4w51y6/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0294731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0295085Z ok (2.206s) 2022-12-01T11:01:46.0295544Z test_ddp_with_pypg_with_grad_views (__main__.TestDDPWithWorkWrapper) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48227 2022-12-01T11:01:46.0296249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0296680Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0297256Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0297721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0298159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0298640Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqomkzmqw 2022-12-01T11:01:46.0299184Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqomkzmqw/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0299706Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0300040Z ok (2.106s) 2022-12-01T11:01:46.0300555Z test_invalid_powerSGD_state (__main__.TestDDPWithWorkWrapper) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48263 2022-12-01T11:01:46.0301277Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0301726Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0302286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0302753Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0303188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0303997Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0305078Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0306162Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0307312Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0308396Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0309471Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-12-01T11:01:46.0310111Z ok (2.106s) 2022-12-01T11:01:46.0310569Z test_sync_batch_norm_empty_input (__main__.TestDDPWithWorkWrapper) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48297 2022-12-01T11:01:46.0311261Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0311714Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0312289Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0312744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0313184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0313689Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphwshuuyb 2022-12-01T11:01:46.0314242Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphwshuuyb/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0314815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0315192Z ok (3.509s) 2022-12-01T11:01:46.0315652Z test_sync_batch_norm_only_empty_input (__main__.TestDDPWithWorkWrapper) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48334 2022-12-01T11:01:46.0316365Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:01:46.0316797Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:01:46.0317377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:01:46.0317852Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:01:46.0318279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:01:46.0318784Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw06k6ygl 2022-12-01T11:01:46.0319331Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw06k6ygl/_remote_module_non_scriptable.py 2022-12-01T11:01:46.0319849Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:01:46.0320186Z ok (3.610s) 2022-12-01T11:01:46.0320335Z 2022-12-01T11:01:46.0320609Z ---------------------------------------------------------------------- 2022-12-01T11:01:46.0320946Z Ran 38 tests in 139.506s 2022-12-01T11:01:46.0321112Z 2022-12-01T11:01:46.0321187Z OK 2022-12-01T11:01:46.0321322Z 2022-12-01T11:01:46.0321447Z Generating XML reports... 2022-12-01T11:01:46.0322055Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_pypg/TEST-TestDDPWithWorkSubclass-20221201105926.xml 2022-12-01T11:01:46.0323383Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_pypg/TEST-TestDDPWithWorkWrapper-20221201105926.xml 2022-12-01T11:01:46.0323840Z 2022-12-01T11:01:46.0324259Z ##[endgroup] 2022-12-01T11:01:46.0324837Z FINISHED PRINTING LOG FILE of distributed/test_c10d_pypg (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_pypg_c1izh2lc) 2022-12-01T11:01:46.0325171Z 2022-12-01T11:01:46.0325434Z Running distributed/test_pg_wrapper ... [2022-12-01 11:01:46.004796] 2022-12-01T11:01:46.0326116Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_pg_wrapper.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:01:46.005053] 2022-12-01T11:03:23.3914566Z 2022-12-01T11:03:23.3915040Z Expand the folded group to see the log file of distributed/test_pg_wrapper 2022-12-01T11:03:23.3933837Z ##[group]PRINTING LOG FILE of distributed/test_pg_wrapper (/var/lib/jenkins/workspace/test/test-reports/distributed-test_pg_wrapper_tgjo6mzb) 2022-12-01T11:03:23.3934667Z 2022-12-01T11:03:23.3935317Z 2022-12-01T11:03:23.3938041Z , <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch_cuda>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch_cuda_debug_mode>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch_debug_mode>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch_cuda>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch_cuda_debug_mode>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch_debug_mode>]> 2022-12-01T11:03:23.3940454Z test_collective_hang (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3941429Z test_collective_shape_mismatch (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3941998Z test_collective_shape_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3942725Z test_collective_shape_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3943242Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3943696Z test_collectives_op_mismatch (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3944145Z test_collectives_op_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3944734Z test_collectives_op_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3945716Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-12-01T11:03:23.3947211Z , <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collective_shape_mismatch>, <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collective_shape_mismatch_debug_mode>, <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collectives_op_mismatch>, <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collectives_op_mismatch_debug_mode>]> 2022-12-01T11:03:23.3948216Z test_collective_hang (__main__.ProcessGroupNCCLWrapperTest) 2022-12-01T11:03:23.3948669Z test_collective_shape_mismatch (__main__.ProcessGroupNCCLWrapperTest) 2022-12-01T11:03:23.3949227Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) 2022-12-01T11:03:23.3950047Z test_collectives_op_mismatch (__main__.ProcessGroupNCCLWrapperTest) 2022-12-01T11:03:23.3950966Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) 2022-12-01T11:03:23.3952316Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3953084Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3954338Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3955151Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3955582Z 2022-12-01T11:03:23.3955755Z Running tests... 2022-12-01T11:03:23.3956425Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.3957426Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.3958331Z test_collective_hang (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.3959229Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48439 2022-12-01T11:03:23.3960132Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48440 2022-12-01T11:03:23.3960958Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48441 2022-12-01T11:03:23.3961403Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48442 2022-12-01T11:03:23.3962018Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3963152Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3963763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3964231Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3964803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3965225Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3965789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3966252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3966803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3967366Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3967958Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3968413Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3968969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3969415Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3969978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3970438Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3970861Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.3971335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.3971801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.3972248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.3972980Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.3973472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-12-01T11:03:23.3973964Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.3974434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-12-01T11:03:23.3975201Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.3975891Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.3976572Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.3977231Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.3977745Z [E ProcessGroupGloo.cpp:2803] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 2000 ms 2022-12-01T11:03:23.3978209Z [E ProcessGroupGloo.cpp:137] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 2000 ms 2022-12-01T11:03:23.3978795Z [E ProcessGroupGloo.cpp:137] Rank 2 successfully reached monitoredBarrier, but received errors while waiting for send/recv from rank 0. Please check rank 0 logs for faulty rank. 2022-12-01T11:03:23.3979473Z [E ProcessGroupGloo.cpp:137] Rank 3 successfully reached monitoredBarrier, but received errors while waiting for send/recv from rank 0. Please check rank 0 logs for faulty rank. 2022-12-01T11:03:23.3979922Z ok (4.074s) 2022-12-01T11:03:23.3980072Z 2022-12-01T11:03:23.3980341Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.3980650Z Ran 1 test in 4.075s 2022-12-01T11:03:23.3980810Z 2022-12-01T11:03:23.3980900Z OK 2022-12-01T11:03:23.3981032Z 2022-12-01T11:03:23.3981156Z Generating XML reports... 2022-12-01T11:03:23.3981785Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110149.xml 2022-12-01T11:03:23.3982503Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3982947Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3983523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3984038Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3984285Z 2022-12-01T11:03:23.3984393Z Running tests... 2022-12-01T11:03:23.3984797Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.3985321Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.3985839Z test_collective_shape_mismatch (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.3986348Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48646 2022-12-01T11:03:23.3986794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48647 2022-12-01T11:03:23.3987215Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48648 2022-12-01T11:03:23.3987664Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48649 2022-12-01T11:03:23.3988263Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3988708Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3989249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3989694Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3990262Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3990726Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3991286Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3991744Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3992391Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3992819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3993386Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3993847Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3994417Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.3994840Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.3995402Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.3995860Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.3996282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.3996752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.3997215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.3997676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.3998145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.3998638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.3999124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-12-01T11:03:23.3999609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-12-01T11:03:23.4000253Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4001001Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4001699Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4002853Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4003330Z ok (4.159s) 2022-12-01T11:03:23.4003480Z 2022-12-01T11:03:23.4003755Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4004078Z Ran 1 test in 4.160s 2022-12-01T11:03:23.4004238Z 2022-12-01T11:03:23.4004312Z OK 2022-12-01T11:03:23.4004444Z 2022-12-01T11:03:23.4004566Z Generating XML reports... 2022-12-01T11:03:23.4005195Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110155.xml 2022-12-01T11:03:23.4005936Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4006370Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4006941Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4007409Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4007638Z 2022-12-01T11:03:23.4007729Z Running tests... 2022-12-01T11:03:23.4008127Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4008656Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4009194Z test_collective_shape_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4009809Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48853 2022-12-01T11:03:23.4010257Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48854 2022-12-01T11:03:23.4010698Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 48855 2022-12-01T11:03:23.4011120Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 48856 2022-12-01T11:03:23.4011726Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4012175Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4012744Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4013193Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4013763Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4014207Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4014761Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4015225Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4015790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4016226Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4016773Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4017234Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4017802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4018241Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4018790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4019326Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4019775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4020226Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4020680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4021143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4021528Z skip: Need at least 4 CUDA devices (3.859s) 2022-12-01T11:03:23.4021703Z 2022-12-01T11:03:23.4021974Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4022304Z Ran 1 test in 3.859s 2022-12-01T11:03:23.4022462Z 2022-12-01T11:03:23.4022569Z OK (skipped=1) 2022-12-01T11:03:23.4022722Z 2022-12-01T11:03:23.4022828Z Generating XML reports... 2022-12-01T11:03:23.4023452Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110202.xml 2022-12-01T11:03:23.4024178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4024623Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4025174Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4025639Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4025868Z 2022-12-01T11:03:23.4025974Z Running tests... 2022-12-01T11:03:23.4026355Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4026956Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4027512Z test_collective_shape_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4028039Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49024 2022-12-01T11:03:23.4028466Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49025 2022-12-01T11:03:23.4028895Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49026 2022-12-01T11:03:23.4029331Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49027 2022-12-01T11:03:23.4029915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4030359Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4030908Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4031352Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4031913Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4032378Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4032956Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4033416Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4033970Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4034413Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4034975Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4035419Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4036073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4036531Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4037096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4037536Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4037968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4038435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4038880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4039343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4039728Z skip: Need at least 4 CUDA devices (3.870s) 2022-12-01T11:03:23.4039919Z 2022-12-01T11:03:23.4040194Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4040504Z Ran 1 test in 3.870s 2022-12-01T11:03:23.4040665Z 2022-12-01T11:03:23.4040773Z OK (skipped=1) 2022-12-01T11:03:23.4040925Z 2022-12-01T11:03:23.4041046Z Generating XML reports... 2022-12-01T11:03:23.4041652Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110208.xml 2022-12-01T11:03:23.4042876Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4043474Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4044054Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4044622Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4044851Z 2022-12-01T11:03:23.4044957Z Running tests... 2022-12-01T11:03:23.4045362Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4045878Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4046427Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4046951Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49195 2022-12-01T11:03:23.4047396Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49196 2022-12-01T11:03:23.4047820Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49197 2022-12-01T11:03:23.4048250Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49198 2022-12-01T11:03:23.4048858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4049300Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4049857Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4050323Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4050940Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4051362Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4051927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4052384Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4052950Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4053373Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4054015Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4054488Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4055047Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4055484Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4056048Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4056506Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4056918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4057391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4057852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4058320Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4058789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4059277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4059764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-12-01T11:03:23.4060230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-12-01T11:03:23.4060891Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4061669Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4062353Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4063015Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4063538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T11:03:23.4064022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2022-12-01T11:03:23.4064502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T11:03:23.4064972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2022-12-01T11:03:23.4065613Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4066299Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4066974Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4067630Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4068021Z ok (4.164s) 2022-12-01T11:03:23.4068169Z 2022-12-01T11:03:23.4068435Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4068740Z Ran 1 test in 4.165s 2022-12-01T11:03:23.4068900Z 2022-12-01T11:03:23.4068991Z OK 2022-12-01T11:03:23.4069121Z 2022-12-01T11:03:23.4069243Z Generating XML reports... 2022-12-01T11:03:23.4069868Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110214.xml 2022-12-01T11:03:23.4070593Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4071099Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4071690Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4072139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4072366Z 2022-12-01T11:03:23.4072473Z Running tests... 2022-12-01T11:03:23.4072871Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4073395Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4073906Z test_collectives_op_mismatch (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4074410Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49414 2022-12-01T11:03:23.4074860Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49415 2022-12-01T11:03:23.4075285Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49416 2022-12-01T11:03:23.4075721Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49417 2022-12-01T11:03:23.4076317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4076763Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4077315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4077779Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4078351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4078865Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4079422Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4079884Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4080454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4080874Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4081439Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4081895Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4083073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4083521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4084098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4084562Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4084979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4085453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4085908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4086371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4086838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4087329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-12-01T11:03:23.4087819Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4088576Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4089105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-12-01T11:03:23.4089749Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4090425Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4091106Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4091471Z ok (4.150s) 2022-12-01T11:03:23.4091618Z 2022-12-01T11:03:23.4091881Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4092205Z Ran 1 test in 4.151s 2022-12-01T11:03:23.4092365Z 2022-12-01T11:03:23.4092441Z OK 2022-12-01T11:03:23.4092572Z 2022-12-01T11:03:23.4092694Z Generating XML reports... 2022-12-01T11:03:23.4093328Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110220.xml 2022-12-01T11:03:23.4094060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4094491Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4095060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4095522Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4095751Z 2022-12-01T11:03:23.4095842Z Running tests... 2022-12-01T11:03:23.4096238Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4096863Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4097400Z test_collectives_op_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4097898Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49621 2022-12-01T11:03:23.4098343Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49622 2022-12-01T11:03:23.4098780Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49623 2022-12-01T11:03:23.4099198Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49624 2022-12-01T11:03:23.4099798Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4100244Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4100811Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4101262Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4101835Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4102276Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4102825Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4103288Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4103856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4104294Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4104839Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4105302Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4105929Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4106380Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4106932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4107393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4107821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4108272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4108725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4109193Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4109577Z skip: Need at least 4 CUDA devices (3.856s) 2022-12-01T11:03:23.4109753Z 2022-12-01T11:03:23.4110026Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4110351Z Ran 1 test in 3.857s 2022-12-01T11:03:23.4110511Z 2022-12-01T11:03:23.4110616Z OK (skipped=1) 2022-12-01T11:03:23.4110768Z 2022-12-01T11:03:23.4110875Z Generating XML reports... 2022-12-01T11:03:23.4111498Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110227.xml 2022-12-01T11:03:23.4112229Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4112673Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4113224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4113762Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4113990Z 2022-12-01T11:03:23.4114098Z Running tests... 2022-12-01T11:03:23.4114486Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4115015Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4115560Z test_collectives_op_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4116083Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49792 2022-12-01T11:03:23.4116512Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49793 2022-12-01T11:03:23.4116941Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49794 2022-12-01T11:03:23.4117383Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49795 2022-12-01T11:03:23.4117973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4118423Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4118991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4119450Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4120006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4120448Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4121011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4136953Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4137705Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4138185Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4138968Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4139508Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4140121Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4140612Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4141222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4141727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4142179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4142688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4143201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4143680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4144098Z skip: Need at least 4 CUDA devices (3.877s) 2022-12-01T11:03:23.4144303Z 2022-12-01T11:03:23.4144591Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4144937Z Ran 1 test in 3.877s 2022-12-01T11:03:23.4145107Z 2022-12-01T11:03:23.4145203Z OK (skipped=1) 2022-12-01T11:03:23.4145364Z 2022-12-01T11:03:23.4145491Z Generating XML reports... 2022-12-01T11:03:23.4146153Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110233.xml 2022-12-01T11:03:23.4146932Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4147501Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4148124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4148628Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4148876Z 2022-12-01T11:03:23.4148971Z Running tests... 2022-12-01T11:03:23.4149392Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4149960Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4150587Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4151111Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49963 2022-12-01T11:03:23.4151565Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49964 2022-12-01T11:03:23.4152016Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 49965 2022-12-01T11:03:23.4152446Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 49966 2022-12-01T11:03:23.4153059Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4153512Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4154084Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4154539Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4155113Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4155554Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4156110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4156578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4157210Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4157666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4158218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4158682Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4159253Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4159690Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4160240Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4160706Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4161142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4161596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:03:23.4162053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:03:23.4163129Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4163643Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4164126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4164623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-12-01T11:03:23.4165249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-12-01T11:03:23.4165913Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4166604Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4167294Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4167979Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-12-01T11:03:23.4168489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T11:03:23.4168982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T11:03:23.4169476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2022-12-01T11:03:23.4169972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2022-12-01T11:03:23.4170602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4171282Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4171958Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4172623Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-12-01T11:03:23.4172991Z ok (4.172s) 2022-12-01T11:03:23.4173140Z 2022-12-01T11:03:23.4173410Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4173744Z Ran 1 test in 4.173s 2022-12-01T11:03:23.4173907Z 2022-12-01T11:03:23.4173982Z OK 2022-12-01T11:03:23.4174116Z 2022-12-01T11:03:23.4174241Z Generating XML reports... 2022-12-01T11:03:23.4174948Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110239.xml 2022-12-01T11:03:23.4175711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4176144Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4176716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4177184Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4177411Z 2022-12-01T11:03:23.4177521Z Running tests... 2022-12-01T11:03:23.4177904Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4178436Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4178955Z test_collective_hang (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4179436Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50182 2022-12-01T11:03:23.4179891Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50183 2022-12-01T11:03:23.4180495Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4180943Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4181494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4181955Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4182524Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4183025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4183600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4184066Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4184501Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4184951Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4185431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4185929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4186568Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4187261Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4187789Z [E ProcessGroupGloo.cpp:2803] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 2000 ms 2022-12-01T11:03:23.4188256Z [E ProcessGroupGloo.cpp:137] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 2000 ms 2022-12-01T11:03:23.4188582Z ok (3.765s) 2022-12-01T11:03:23.4188729Z 2022-12-01T11:03:23.4188999Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4189323Z Ran 1 test in 3.765s 2022-12-01T11:03:23.4189485Z 2022-12-01T11:03:23.4189560Z OK 2022-12-01T11:03:23.4189695Z 2022-12-01T11:03:23.4189819Z Generating XML reports... 2022-12-01T11:03:23.4190450Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110245.xml 2022-12-01T11:03:23.4191180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4191616Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4192248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4192733Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4192961Z 2022-12-01T11:03:23.4193070Z Running tests... 2022-12-01T11:03:23.4193459Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4193991Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4194525Z test_collective_shape_mismatch (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4195020Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50295 2022-12-01T11:03:23.4195476Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50296 2022-12-01T11:03:23.4196083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4196539Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4197101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4197567Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4198139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4198563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4199127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4199586Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4200114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4200589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4201073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4201556Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4202215Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4203383Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4203780Z ok (5.541s) 2022-12-01T11:03:23.4203929Z 2022-12-01T11:03:23.4204195Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4204508Z Ran 1 test in 5.541s 2022-12-01T11:03:23.4204667Z 2022-12-01T11:03:23.4204760Z OK 2022-12-01T11:03:23.4204894Z 2022-12-01T11:03:23.4205020Z Generating XML reports... 2022-12-01T11:03:23.4205656Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110251.xml 2022-12-01T11:03:23.4206380Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4206836Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4207411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4207863Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4208093Z 2022-12-01T11:03:23.4208202Z Running tests... 2022-12-01T11:03:23.4208600Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4209133Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4209762Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4210305Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50424 2022-12-01T11:03:23.4210759Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50425 2022-12-01T11:03:23.4211351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4211798Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4212362Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4212827Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4213382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4213830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4214394Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4214855Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4215275Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4215744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4216229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4216710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4217365Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4218149Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4218679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T11:03:23.4219155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T11:03:23.4219806Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T11:03:23.4220489Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T11:03:23.4220879Z ok (5.557s) 2022-12-01T11:03:23.4221010Z 2022-12-01T11:03:23.4221276Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4221602Z Ran 1 test in 5.557s 2022-12-01T11:03:23.4221763Z 2022-12-01T11:03:23.4221862Z OK 2022-12-01T11:03:23.4221996Z 2022-12-01T11:03:23.4222104Z Generating XML reports... 2022-12-01T11:03:23.4222734Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110259.xml 2022-12-01T11:03:23.4223469Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4223919Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4224473Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4224943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4225170Z 2022-12-01T11:03:23.4225278Z Running tests... 2022-12-01T11:03:23.4225662Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4226188Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4226723Z test_collectives_op_mismatch (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4227312Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50563 2022-12-01T11:03:23.4228129Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50564 2022-12-01T11:03:23.4229333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4230139Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4231160Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4232038Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4233100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4233864Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4234897Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4235765Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4236614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4237537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4238454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4239424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4240610Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4242018Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4243301Z ok (6.659s) 2022-12-01T11:03:23.4243584Z 2022-12-01T11:03:23.4244127Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4244732Z Ran 1 test in 6.659s 2022-12-01T11:03:23.4245027Z 2022-12-01T11:03:23.4245145Z OK 2022-12-01T11:03:23.4245370Z 2022-12-01T11:03:23.4245597Z Generating XML reports... 2022-12-01T11:03:23.4246824Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110307.xml 2022-12-01T11:03:23.4248341Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4249172Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4250296Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4251276Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4251729Z 2022-12-01T11:03:23.4251915Z Running tests... 2022-12-01T11:03:23.4252672Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4253701Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-12-01T11:03:23.4254796Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:03:23.4255836Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50693 2022-12-01T11:03:23.4256720Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50694 2022-12-01T11:03:23.4257937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4258820Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4259952Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4261062Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4262248Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:03:23.4263130Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:03:23.4264249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:03:23.4265171Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:03:23.4266041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:03:23.4266982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:03:23.4267892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:03:23.4268890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:03:23.4270191Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4271569Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:03:23.4272644Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-12-01T11:03:23.4273579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-12-01T11:03:23.4274919Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T11:03:23.4276295Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-12-01T11:03:23.4277191Z ok (6.530s) 2022-12-01T11:03:23.4277458Z 2022-12-01T11:03:23.4277956Z ---------------------------------------------------------------------- 2022-12-01T11:03:23.4278563Z Ran 1 test in 6.530s 2022-12-01T11:03:23.4278858Z 2022-12-01T11:03:23.4278989Z OK 2022-12-01T11:03:23.4279225Z 2022-12-01T11:03:23.4279444Z Generating XML reports... 2022-12-01T11:03:23.4280670Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110316.xml 2022-12-01T11:03:23.4281416Z 2022-12-01T11:03:23.4281995Z ##[endgroup] 2022-12-01T11:03:23.4283523Z FINISHED PRINTING LOG FILE of distributed/test_pg_wrapper (/var/lib/jenkins/workspace/test/test-reports/distributed-test_pg_wrapper_tgjo6mzb) 2022-12-01T11:03:23.4284205Z 2022-12-01T11:03:23.4284698Z Running distributed/test_c10d_spawn_gloo ... [2022-12-01 11:03:23.392447] 2022-12-01T11:03:23.4286091Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_spawn_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:03:23.392796] 2022-12-01T11:04:43.3803271Z 2022-12-01T11:04:43.3804010Z Expand the folded group to see the log file of distributed/test_c10d_spawn_gloo 2022-12-01T11:04:43.3804938Z ##[group]PRINTING LOG FILE of distributed/test_c10d_spawn_gloo (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_spawn_gloo_hvrnel96) 2022-12-01T11:04:43.3807842Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprs0v6h97 2022-12-01T11:04:43.3808423Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprs0v6h97/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3808861Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3809604Z , <__main__.DistributedDataParallelSingleProcessTest testMethod=test_cuda>, <__main__.DistributedDataParallelSingleProcessTest testMethod=test_rnn>]> 2022-12-01T11:04:43.3810413Z test_cpu (__main__.DistributedDataParallelSingleProcessTest) 2022-12-01T11:04:43.3810856Z test_cuda (__main__.DistributedDataParallelSingleProcessTest) 2022-12-01T11:04:43.3811541Z test_rnn (__main__.DistributedDataParallelSingleProcessTest) 2022-12-01T11:04:43.3811948Z 2022-12-01T11:04:43.3812915Z 2022-12-01T11:04:43.3814360Z , <__main__.TestDistributedNNFunctionsGloo testMethod=test_all_to_all>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_all_to_all_single>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_allreduce>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_broadcast>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_gather>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_reduce>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_scatter>]> 2022-12-01T11:04:43.3815631Z test_all_gather (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3816018Z test_all_to_all (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3816434Z test_all_to_all_single (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3816847Z test_allreduce (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3817234Z test_broadcast (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3817634Z test_gather (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3818030Z test_reduce (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3818405Z test_scatter (__main__.TestDistributedNNFunctionsGloo) 2022-12-01T11:04:43.3819128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3819587Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3820165Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3821011Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3821860Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmbz7sflm 2022-12-01T11:04:43.3822821Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmbz7sflm/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3823545Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3823879Z 2022-12-01T11:04:43.3824052Z Running tests... 2022-12-01T11:04:43.3824770Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3825702Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3826733Z test_cpu (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:04:43.3827628Z ok (0.023s) 2022-12-01T11:04:43.3827935Z 2022-12-01T11:04:43.3828399Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3828909Z Ran 1 test in 0.023s 2022-12-01T11:04:43.3829073Z 2022-12-01T11:04:43.3829171Z OK 2022-12-01T11:04:43.3829291Z 2022-12-01T11:04:43.3829412Z Generating XML reports... 2022-12-01T11:04:43.3830108Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20221201110329.xml 2022-12-01T11:04:43.3830890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3831323Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3831895Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3832361Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3832830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsjp5gpvz 2022-12-01T11:04:43.3833356Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsjp5gpvz/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3833883Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3834101Z 2022-12-01T11:04:43.3834207Z Running tests... 2022-12-01T11:04:43.3834597Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3835128Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3835743Z test_cuda (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:04:43.3836211Z ok (0.466s) 2022-12-01T11:04:43.3836342Z 2022-12-01T11:04:43.3836612Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3836938Z Ran 1 test in 0.467s 2022-12-01T11:04:43.3837103Z 2022-12-01T11:04:43.3837195Z OK 2022-12-01T11:04:43.3837328Z 2022-12-01T11:04:43.3837434Z Generating XML reports... 2022-12-01T11:04:43.3838121Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20221201110332.xml 2022-12-01T11:04:43.3838900Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3839349Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3839907Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3840376Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3840838Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpanq07qzk 2022-12-01T11:04:43.3841360Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpanq07qzk/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3841869Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3842067Z 2022-12-01T11:04:43.3842175Z Running tests... 2022-12-01T11:04:43.3842935Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3843547Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3844572Z test_rnn (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-12-01T11:04:43.3845078Z ok (1.249s) 2022-12-01T11:04:43.3845225Z 2022-12-01T11:04:43.3845503Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3845812Z Ran 1 test in 1.249s 2022-12-01T11:04:43.3845976Z 2022-12-01T11:04:43.3846065Z OK 2022-12-01T11:04:43.3846198Z 2022-12-01T11:04:43.3846322Z Generating XML reports... 2022-12-01T11:04:43.3846988Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20221201110336.xml 2022-12-01T11:04:43.3847771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3848229Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3848804Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3849257Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3849725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzjjk5on5 2022-12-01T11:04:43.3850257Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzjjk5on5/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3850667Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3850863Z 2022-12-01T11:04:43.3850975Z Running tests... 2022-12-01T11:04:43.3851374Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3851898Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3852589Z test_all_gather (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50984 2022-12-01T11:04:43.3853151Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50985 2022-12-01T11:04:43.3853760Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3854191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3854762Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3855228Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3855801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3856232Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3856801Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3857268Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3857715Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpndwyipr2 2022-12-01T11:04:43.3858258Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpndwyipr2/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3858792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo0nybaik 2022-12-01T11:04:43.3859328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo0nybaik/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3859733Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3860135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3860638Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3861023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3861511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3862002Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3862664Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3863333Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3863724Z ok (4.726s) 2022-12-01T11:04:43.3863875Z 2022-12-01T11:04:43.3864141Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3864467Z Ran 1 test in 4.726s 2022-12-01T11:04:43.3864614Z 2022-12-01T11:04:43.3864710Z OK 2022-12-01T11:04:43.3864840Z 2022-12-01T11:04:43.3864964Z Generating XML reports... 2022-12-01T11:04:43.3865606Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110341.xml 2022-12-01T11:04:43.3866331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3866777Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3867348Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3867820Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3868266Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplwsyctkn 2022-12-01T11:04:43.3868804Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplwsyctkn/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3869236Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3869434Z 2022-12-01T11:04:43.3869525Z Running tests... 2022-12-01T11:04:43.3869995Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3870548Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3871115Z test_all_to_all (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51100 2022-12-01T11:04:43.3871631Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51101 2022-12-01T11:04:43.3872236Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3872688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3873239Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3873710Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3874291Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3874735Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3875284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3875746Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3876209Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn31sex9o 2022-12-01T11:04:43.3876750Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn31sex9o/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3877260Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpis8ptqk2 2022-12-01T11:04:43.3877874Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpis8ptqk2/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3878301Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3878690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3879079Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3879479Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3879949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3880440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3881100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3881785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3882163Z ok (4.726s) 2022-12-01T11:04:43.3882310Z 2022-12-01T11:04:43.3882863Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3883200Z Ran 1 test in 4.726s 2022-12-01T11:04:43.3883361Z 2022-12-01T11:04:43.3883455Z OK 2022-12-01T11:04:43.3883572Z 2022-12-01T11:04:43.3883695Z Generating XML reports... 2022-12-01T11:04:43.3884337Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110349.xml 2022-12-01T11:04:43.3885077Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3885509Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3886080Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3886551Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3887018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4qxh1zzs 2022-12-01T11:04:43.3887632Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4qxh1zzs/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3888083Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3888278Z 2022-12-01T11:04:43.3888387Z Running tests... 2022-12-01T11:04:43.3888777Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3889313Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3889892Z test_all_to_all_single (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51216 2022-12-01T11:04:43.3890435Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51217 2022-12-01T11:04:43.3891024Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3891477Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3892058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3892524Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3893090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3893526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3894091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3894538Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3895003Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpulv5vd1g 2022-12-01T11:04:43.3895689Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpulv5vd1g/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3896257Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiv08ly_i 2022-12-01T11:04:43.3896803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiv08ly_i/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3897250Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3897673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3898067Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3898487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3898998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3899524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3900214Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3900943Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3901358Z ok (4.726s) 2022-12-01T11:04:43.3901512Z 2022-12-01T11:04:43.3901776Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3902125Z Ran 1 test in 4.726s 2022-12-01T11:04:43.3902293Z 2022-12-01T11:04:43.3902386Z OK 2022-12-01T11:04:43.3902523Z 2022-12-01T11:04:43.3902651Z Generating XML reports... 2022-12-01T11:04:43.3903309Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110357.xml 2022-12-01T11:04:43.3904090Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3904563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3905154Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3905716Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3906222Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy6pgxw80 2022-12-01T11:04:43.3906791Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy6pgxw80/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3907220Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3907422Z 2022-12-01T11:04:43.3907531Z Running tests... 2022-12-01T11:04:43.3907960Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3908512Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3909116Z test_allreduce (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51332 2022-12-01T11:04:43.3909694Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51333 2022-12-01T11:04:43.3910336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3910794Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3911396Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3911887Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3912494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3912950Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3913551Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3914127Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3914597Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphloy758a 2022-12-01T11:04:43.3915162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphloy758a/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3915715Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl9_li_5w 2022-12-01T11:04:43.3916272Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl9_li_5w/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3916696Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3917115Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3917523Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3917926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3918436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3918963Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3919668Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3920381Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3920803Z ok (4.726s) 2022-12-01T11:04:43.3920956Z 2022-12-01T11:04:43.3921236Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3921561Z Ran 1 test in 4.726s 2022-12-01T11:04:43.3921733Z 2022-12-01T11:04:43.3921825Z OK 2022-12-01T11:04:43.3921964Z 2022-12-01T11:04:43.3922095Z Generating XML reports... 2022-12-01T11:04:43.3922989Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110405.xml 2022-12-01T11:04:43.3923731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3924273Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3924875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3925347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3925795Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmw5pjqrm 2022-12-01T11:04:43.3926336Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmw5pjqrm/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3926766Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3926961Z 2022-12-01T11:04:43.3927053Z Running tests... 2022-12-01T11:04:43.3927456Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3927996Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3928572Z test_broadcast (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51448 2022-12-01T11:04:43.3929097Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51449 2022-12-01T11:04:43.3929700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3930148Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3930700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3931170Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3931743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3932280Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3932843Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3933308Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3933771Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppd6omt3j 2022-12-01T11:04:43.3934293Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppd6omt3j/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3934825Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptvq25j6j 2022-12-01T11:04:43.3935352Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptvq25j6j/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3935774Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3936162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3936566Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3936956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3937425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3937916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3938576Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3939258Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3939631Z ok (4.726s) 2022-12-01T11:04:43.3939778Z 2022-12-01T11:04:43.3940043Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3940368Z Ran 1 test in 4.726s 2022-12-01T11:04:43.3940526Z 2022-12-01T11:04:43.3940605Z OK 2022-12-01T11:04:43.3940739Z 2022-12-01T11:04:43.3940862Z Generating XML reports... 2022-12-01T11:04:43.3941561Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110413.xml 2022-12-01T11:04:43.3942317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3942748Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3943317Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3943788Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3944249Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp84aiycc7 2022-12-01T11:04:43.3944804Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp84aiycc7/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3945239Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3945434Z 2022-12-01T11:04:43.3945542Z Running tests... 2022-12-01T11:04:43.3945929Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3946463Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3947030Z test_gather (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51564 2022-12-01T11:04:43.3947563Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51565 2022-12-01T11:04:43.3948151Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3948600Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3949169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3949694Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3950278Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3950721Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3951285Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3951727Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3952190Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8fy9xvpb 2022-12-01T11:04:43.3952724Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8fy9xvpb/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3953256Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpolg93qu8 2022-12-01T11:04:43.3953767Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpolg93qu8/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3954197Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3954601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3954974Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3955370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3955849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3956320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3956977Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3957655Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3958050Z ok (4.728s) 2022-12-01T11:04:43.3958195Z 2022-12-01T11:04:43.3958446Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3958770Z Ran 1 test in 4.728s 2022-12-01T11:04:43.3958930Z 2022-12-01T11:04:43.3959085Z OK 2022-12-01T11:04:43.3959230Z 2022-12-01T11:04:43.3959354Z Generating XML reports... 2022-12-01T11:04:43.3959981Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110421.xml 2022-12-01T11:04:43.3960718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3961168Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3961721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3962189Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3962905Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp00yl93xz 2022-12-01T11:04:43.3963444Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp00yl93xz/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3963857Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3964052Z 2022-12-01T11:04:43.3964159Z Running tests... 2022-12-01T11:04:43.3964565Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3965083Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3965650Z test_reduce (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51680 2022-12-01T11:04:43.3966185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51681 2022-12-01T11:04:43.3966789Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3967322Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3967896Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3968367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3968925Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3969367Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3969933Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3970393Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3970837Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6ihsre6j 2022-12-01T11:04:43.3971373Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6ihsre6j/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3971905Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm0eie5e4 2022-12-01T11:04:43.3972439Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm0eie5e4/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3972848Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3973252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3973639Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3974018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3974504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3974995Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3975650Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3976322Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3976790Z ok (4.728s) 2022-12-01T11:04:43.3976955Z 2022-12-01T11:04:43.3977221Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3977528Z Ran 1 test in 4.728s 2022-12-01T11:04:43.3977686Z 2022-12-01T11:04:43.3977780Z OK 2022-12-01T11:04:43.3977913Z 2022-12-01T11:04:43.3978035Z Generating XML reports... 2022-12-01T11:04:43.3978673Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110429.xml 2022-12-01T11:04:43.3979390Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3979837Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3980411Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3980867Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3981334Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxmnsa_fx 2022-12-01T11:04:43.3981872Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxmnsa_fx/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3982297Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3982475Z 2022-12-01T11:04:43.3982585Z Running tests... 2022-12-01T11:04:43.3982986Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3983524Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-12-01T11:04:43.3984074Z test_scatter (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51796 2022-12-01T11:04:43.3984699Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51797 2022-12-01T11:04:43.3985301Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3985753Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3986307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3986772Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3987345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:04:43.3987782Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:04:43.3988333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:04:43.3988792Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:04:43.3989258Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw_1wn54_ 2022-12-01T11:04:43.3989779Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw_1wn54_/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3990305Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpexgsd6q4 2022-12-01T11:04:43.3990839Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpexgsd6q4/_remote_module_non_scriptable.py 2022-12-01T11:04:43.3991266Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3991570Z INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:04:43.3991972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:04:43.3992442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:04:43.3992908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:04:43.3993400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:04:43.3994148Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3994857Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:04:43.3995232Z ok (4.727s) 2022-12-01T11:04:43.3995378Z 2022-12-01T11:04:43.3995642Z ---------------------------------------------------------------------- 2022-12-01T11:04:43.3995965Z Ran 1 test in 4.727s 2022-12-01T11:04:43.3996124Z 2022-12-01T11:04:43.3996198Z OK 2022-12-01T11:04:43.3996331Z 2022-12-01T11:04:43.3996452Z Generating XML reports... 2022-12-01T11:04:43.3997089Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110437.xml 2022-12-01T11:04:43.3997474Z 2022-12-01T11:04:43.3997879Z ##[endgroup] 2022-12-01T11:04:43.3998450Z FINISHED PRINTING LOG FILE of distributed/test_c10d_spawn_gloo (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_spawn_gloo_hvrnel96) 2022-12-01T11:04:43.3998795Z 2022-12-01T11:04:43.3999102Z Running distributed/_shard/sharded_tensor/ops/test_matrix_ops ... [2022-12-01 11:04:43.380654] 2022-12-01T11:04:43.3999834Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:04:43.380917] 2022-12-01T11:05:07.8193634Z 2022-12-01T11:05:07.8196158Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_matrix_ops 2022-12-01T11:05:07.8197174Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_matrix_ops_2w93umm5) 2022-12-01T11:05:07.8197587Z 2022-12-01T11:05:07.8197988Z Running tests... 2022-12-01T11:05:07.8199026Z ---------------------------------------------------------------------- 2022-12-01T11:05:07.8199629Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops 2022-12-01T11:05:07.8200206Z test_sharded_tensor_contiguous (__main__.TestShardedTensorMatrixOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:05:07.8200724Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51912 2022-12-01T11:05:07.8201806Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51913 2022-12-01T11:05:07.8202975Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 51914 2022-12-01T11:05:07.8203805Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 51915 2022-12-01T11:05:07.8204962Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8205819Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8206978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8207989Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8209159Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8210038Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8211180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8212087Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8213265Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8214149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8215292Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8216415Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8217666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8218495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8219639Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8220537Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8221390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8222336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8223294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8224241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8225054Z skip: Need at least 4 CUDA devices (3.499s) 2022-12-01T11:05:07.8226090Z test_sharded_tensor_layer_norm (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52048 2022-12-01T11:05:07.8227214Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52049 2022-12-01T11:05:07.8228123Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52050 2022-12-01T11:05:07.8229018Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52051 2022-12-01T11:05:07.8230307Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8231211Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8232590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8233543Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8234738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8235622Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8236779Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8237703Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8238859Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8239747Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8240887Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8241842Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8243534Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8244406Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8245768Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8246684Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8247573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8248524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8249483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8250405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8251169Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8252372Z test_sharded_tensor_layer_norm_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52184 2022-12-01T11:05:07.8253540Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52185 2022-12-01T11:05:07.8254423Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52186 2022-12-01T11:05:07.8255314Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52187 2022-12-01T11:05:07.8256622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8257468Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8258660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8259612Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8260799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8261688Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8262831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8263731Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8264871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8265803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8267012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8268160Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8269358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8270278Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8271443Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8272391Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8273225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8274179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8275121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8276057Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8276821Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8277819Z test_sharded_tensor_masked_fill (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52320 2022-12-01T11:05:07.8278868Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52321 2022-12-01T11:05:07.8279732Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52322 2022-12-01T11:05:07.8280623Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52323 2022-12-01T11:05:07.8281875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8283191Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8284400Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8285347Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8286752Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8287630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8288821Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8289725Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8290909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8291811Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8292995Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8293959Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8295125Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8296025Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8297186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8298119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8298979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8299945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8300889Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8301817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8302800Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8303812Z test_sharded_tensor_masked_fill_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52456 2022-12-01T11:05:07.8304946Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52457 2022-12-01T11:05:07.8305842Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52458 2022-12-01T11:05:07.8306719Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52459 2022-12-01T11:05:07.8307991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8308978Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8310147Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8311092Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8312288Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8313161Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8314345Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8315272Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8316455Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8317336Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8318501Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8319480Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8320645Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8321684Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8323363Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8324303Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8325149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8326100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8327046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8327982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8328795Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T11:05:07.8329827Z test_sharded_tensor_softmax (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52592 2022-12-01T11:05:07.8330927Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52593 2022-12-01T11:05:07.8331814Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52594 2022-12-01T11:05:07.8332702Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52595 2022-12-01T11:05:07.8333978Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8334849Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8336012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8336948Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8338333Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8339230Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8340416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8341367Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8342539Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8343480Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8344660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8345604Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8346740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8347655Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8348856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8349802Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8350666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8351633Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8352602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8353540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8354342Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8355376Z test_sharded_tensor_transpose (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52728 2022-12-01T11:05:07.8356674Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52729 2022-12-01T11:05:07.8357568Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52730 2022-12-01T11:05:07.8358484Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52731 2022-12-01T11:05:07.8359765Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8360644Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8361787Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8363170Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8364377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8365260Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8366437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8367335Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8368523Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8369443Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8370658Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8371594Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8372756Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8373883Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8375093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8376041Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8376915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8377845Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8378770Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8379689Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8380450Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8381466Z test_sharded_tensor_transpose_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52864 2022-12-01T11:05:07.8382570Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52865 2022-12-01T11:05:07.8383409Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 52866 2022-12-01T11:05:07.8384253Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 52867 2022-12-01T11:05:07.8385477Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8386381Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8387513Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8388460Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8389637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8390505Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8391834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8392794Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8393984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8394854Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8396019Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8396988Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8398180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8399058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8400247Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8401179Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8402047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8403436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8404386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8405339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8406104Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8407105Z test_sharded_tensor_type_as (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53000 2022-12-01T11:05:07.8408402Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53001 2022-12-01T11:05:07.8409287Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53002 2022-12-01T11:05:07.8410185Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53003 2022-12-01T11:05:07.8411460Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8412346Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8413502Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8414442Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8415640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8416545Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8417711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8418658Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8419823Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8420704Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8421871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8422819Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8424020Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8424911Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8426267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8427250Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8428106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8429071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8430008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8430941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8431710Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8432691Z test_sharded_tensor_view (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53136 2022-12-01T11:05:07.8433806Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53137 2022-12-01T11:05:07.8434712Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53138 2022-12-01T11:05:07.8435611Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53139 2022-12-01T11:05:07.8436890Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8437850Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8439025Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8439981Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8441204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8442277Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8443856Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8444913Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8446110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8446979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8448136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8449076Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8450276Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8451153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8452356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8453319Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8454188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8455162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8456110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8457050Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8457810Z skip: Need at least 4 CUDA devices (1.910s) 2022-12-01T11:05:07.8458801Z test_sharded_tensor_view_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53272 2022-12-01T11:05:07.8459951Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53273 2022-12-01T11:05:07.8460859Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53274 2022-12-01T11:05:07.8461912Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53275 2022-12-01T11:05:07.8463205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8464114Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8465245Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8466149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8467314Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8468280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8469483Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8470456Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8471666Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8472537Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8473706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8474647Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8475829Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:07.8476703Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:07.8478100Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:07.8479056Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:07.8479926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:07.8480885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:07.8481868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:07.8483196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:07.8483945Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:07.8484340Z 2022-12-01T11:05:07.8484900Z ---------------------------------------------------------------------- 2022-12-01T11:05:07.8485535Z Ran 11 tests in 22.592s 2022-12-01T11:05:07.8485846Z 2022-12-01T11:05:07.8486059Z OK (skipped=11) 2022-12-01T11:05:07.8486325Z 2022-12-01T11:05:07.8486562Z Generating XML reports... 2022-12-01T11:05:07.8487924Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20221201110444.xml 2022-12-01T11:05:07.8488738Z 2022-12-01T11:05:07.8489399Z ##[endgroup] 2022-12-01T11:05:07.8490757Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_matrix_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_matrix_ops_2w93umm5) 2022-12-01T11:05:07.8491579Z 2022-12-01T11:05:07.8492164Z Running distributed/test_c10d_object_collectives ... [2022-12-01 11:05:07.819665] 2022-12-01T11:05:07.8493610Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_c10d_object_collectives.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:05:07.819925] 2022-12-01T11:05:24.9369983Z 2022-12-01T11:05:24.9370704Z Expand the folded group to see the log file of distributed/test_c10d_object_collectives 2022-12-01T11:05:24.9372960Z ##[group]PRINTING LOG FILE of distributed/test_c10d_object_collectives (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_object_collectives_98gmzt15) 2022-12-01T11:05:24.9373722Z 2022-12-01T11:05:24.9373923Z Running tests... 2022-12-01T11:05:24.9374810Z ---------------------------------------------------------------------- 2022-12-01T11:05:24.9375911Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_object_collectives 2022-12-01T11:05:24.9376867Z test_all_gather_object (__main__.TestObjectCollectives) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:05:24.9377762Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53443 2022-12-01T11:05:24.9378589Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53444 2022-12-01T11:05:24.9379813Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9380596Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9381187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9381668Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9382448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9383470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9384420Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9385125Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9385848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:24.9386750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:05:24.9387513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:24.9388011Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:05:24.9388696Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9389383Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9389756Z ok (4.941s) 2022-12-01T11:05:24.9390196Z test_broadcast_object_list (__main__.TestObjectCollectives) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53522 2022-12-01T11:05:24.9390731Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53523 2022-12-01T11:05:24.9391319Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9391776Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9392356Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9392822Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9393382Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9393830Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9394401Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9394867Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9395281Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:24.9395778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:05:24.9396342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:24.9396825Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:05:24.9397481Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9398164Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9398557Z ok (3.510s) 2022-12-01T11:05:24.9398963Z test_gather_object (__main__.TestObjectCollectives) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53601 2022-12-01T11:05:24.9399486Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53602 2022-12-01T11:05:24.9400099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9400546Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9401096Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9401542Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9402116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9402923Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9403521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9403984Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9404421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:24.9405020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:05:24.9405511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:24.9405998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:05:24.9406638Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9407329Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9407722Z ok (3.410s) 2022-12-01T11:05:24.9408154Z test_scatter_object_list (__main__.TestObjectCollectives) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53682 2022-12-01T11:05:24.9408660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53683 2022-12-01T11:05:24.9409270Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9409724Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9410300Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9410781Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9411358Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:24.9411804Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:24.9412375Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:24.9412825Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:24.9413264Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:24.9413837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-12-01T11:05:24.9414326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:24.9414807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-12-01T11:05:24.9415468Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9416146Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-12-01T11:05:24.9416522Z ok (3.410s) 2022-12-01T11:05:24.9416670Z 2022-12-01T11:05:24.9416940Z ---------------------------------------------------------------------- 2022-12-01T11:05:24.9417269Z Ran 4 tests in 15.271s 2022-12-01T11:05:24.9417437Z 2022-12-01T11:05:24.9417513Z OK 2022-12-01T11:05:24.9417645Z 2022-12-01T11:05:24.9417770Z Generating XML reports... 2022-12-01T11:05:24.9418390Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_object_collectives/TEST-TestObjectCollectives-20221201110509.xml 2022-12-01T11:05:24.9418758Z 2022-12-01T11:05:24.9419090Z ##[endgroup] 2022-12-01T11:05:24.9419711Z FINISHED PRINTING LOG FILE of distributed/test_c10d_object_collectives (/var/lib/jenkins/workspace/test/test-reports/distributed-test_c10d_object_collectives_98gmzt15) 2022-12-01T11:05:24.9420075Z 2022-12-01T11:05:24.9420382Z Running distributed/_shard/sharded_tensor/ops/test_tensor_ops ... [2022-12-01 11:05:24.937071] 2022-12-01T11:05:24.9421117Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_tensor_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:05:24.937433] 2022-12-01T11:05:37.8903511Z 2022-12-01T11:05:37.8904309Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_tensor_ops 2022-12-01T11:05:37.8905646Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_tensor_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_tensor_ops_d7te19io) 2022-12-01T11:05:37.8906092Z 2022-12-01T11:05:37.8906207Z Running tests... 2022-12-01T11:05:37.8906738Z ---------------------------------------------------------------------- 2022-12-01T11:05:37.8907316Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops 2022-12-01T11:05:37.8907816Z test_clone (__main__.TestTensorOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:05:37.8908269Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53798 2022-12-01T11:05:37.8908925Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53799 2022-12-01T11:05:37.8909452Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53800 2022-12-01T11:05:37.8909904Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53801 2022-12-01T11:05:37.8910548Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8910979Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8911564Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8912043Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8912613Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8913069Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8913652Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8914133Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8914834Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8915309Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8915894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8916361Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8916927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8917378Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8917951Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8918408Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8918908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:37.8919391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:37.8919860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:37.8920328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:37.8920703Z skip: Need at least 4 CUDA devices (3.391s) 2022-12-01T11:05:37.8921157Z test_deep_copy (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53934 2022-12-01T11:05:37.8921660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53935 2022-12-01T11:05:37.8922091Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 53936 2022-12-01T11:05:37.8923025Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 53937 2022-12-01T11:05:37.8923642Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8924093Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8924667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8925119Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8925692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8926132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8926680Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8927139Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8927714Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8928153Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8928706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8929165Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8929731Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8930149Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8930713Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8931166Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8931599Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:37.8932053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:37.8932605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:37.8933091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:37.8933469Z skip: Need at least 4 CUDA devices (2.010s) 2022-12-01T11:05:37.8933905Z test_detach (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54070 2022-12-01T11:05:37.8934397Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54071 2022-12-01T11:05:37.8934839Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54072 2022-12-01T11:05:37.8935264Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54073 2022-12-01T11:05:37.8935875Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8936321Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8936888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8937339Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8937905Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8938342Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8938879Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8939319Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8939885Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8940453Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8941073Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8941530Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8942099Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8942519Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8943085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8943541Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8943969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:37.8944430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:37.8944884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:37.8945345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:37.8945732Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:37.8946171Z test_inplace_copy (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54206 2022-12-01T11:05:37.8946671Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54207 2022-12-01T11:05:37.8947112Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54208 2022-12-01T11:05:37.8947530Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54209 2022-12-01T11:05:37.8948129Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8948579Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8949207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8949672Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8950249Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8950695Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8951225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8951666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8952230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8952697Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8953266Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8953730Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8954299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8954734Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8955279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8955736Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8956166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:37.8956623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:37.8957154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:37.8957615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:37.8958001Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:37.8958442Z test_set_requires_grad (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54342 2022-12-01T11:05:37.8958952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54343 2022-12-01T11:05:37.8959394Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54344 2022-12-01T11:05:37.8959814Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54345 2022-12-01T11:05:37.8960415Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8960866Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8961440Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8961889Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8962697Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8963143Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8963694Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8964156Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8964728Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8965163Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8965717Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8966257Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8966848Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:37.8967281Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:37.8967830Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:37.8968287Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:37.8968715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:37.8969165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:37.8969623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:37.8970082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:37.8970472Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:37.8970661Z 2022-12-01T11:05:37.8970927Z ---------------------------------------------------------------------- 2022-12-01T11:05:37.8971253Z Ran 5 tests in 11.128s 2022-12-01T11:05:37.8971416Z 2022-12-01T11:05:37.8971522Z OK (skipped=5) 2022-12-01T11:05:37.8971674Z 2022-12-01T11:05:37.8971778Z Generating XML reports... 2022-12-01T11:05:37.8972392Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops/TEST-TestTensorOps-20221201110526.xml 2022-12-01T11:05:37.8972754Z 2022-12-01T11:05:37.8973088Z ##[endgroup] 2022-12-01T11:05:37.8973744Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_tensor_ops (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_tensor_ops_d7te19io) 2022-12-01T11:05:37.8974245Z 2022-12-01T11:05:37.8974527Z Running distributed/_shard/test_partial_tensor ... [2022-12-01 11:05:37.890400] 2022-12-01T11:05:37.8975218Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/test_partial_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:05:37.890793] 2022-12-01T11:05:50.9855450Z 2022-12-01T11:05:50.9856429Z Expand the folded group to see the log file of distributed/_shard/test_partial_tensor 2022-12-01T11:05:50.9857431Z ##[group]PRINTING LOG FILE of distributed/_shard/test_partial_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-test_partial_tensor_wabbvc3n) 2022-12-01T11:05:50.9857799Z 2022-12-01T11:05:50.9857917Z Running tests... 2022-12-01T11:05:50.9858446Z ---------------------------------------------------------------------- 2022-12-01T11:05:50.9859259Z Test results will be stored in test-reports/python-unittest/distributed._shard.test_partial_tensor 2022-12-01T11:05:50.9859782Z test_cat (__main__.TestPartialTensorOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:05:50.9860609Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54513 2022-12-01T11:05:50.9861045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54514 2022-12-01T11:05:50.9861489Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54515 2022-12-01T11:05:50.9862172Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54516 2022-12-01T11:05:50.9862862Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9863622Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9864202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9864722Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9865299Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9866024Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9866629Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9867078Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9867647Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9868110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9868670Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9869138Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9869718Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9870165Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9870722Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9871182Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9871617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:50.9872073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:50.9872532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:50.9872999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:50.9873527Z skip: Need at least 4 CUDA devices (3.602s) 2022-12-01T11:05:50.9873982Z test_cat_errors (__main__.TestPartialTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54649 2022-12-01T11:05:50.9874505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54650 2022-12-01T11:05:50.9874954Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54651 2022-12-01T11:05:50.9875377Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54652 2022-12-01T11:05:50.9875991Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9876441Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9877013Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9877463Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9878040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9878486Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9879042Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9879506Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9880079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9880521Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9881071Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9881535Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9882109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9882852Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9883521Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9884007Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9884444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:50.9884899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:50.9885360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:50.9885813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:50.9886203Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:50.9886660Z test_transpose (__main__.TestPartialTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54785 2022-12-01T11:05:50.9887179Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54786 2022-12-01T11:05:50.9887624Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54787 2022-12-01T11:05:50.9888046Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54788 2022-12-01T11:05:50.9888663Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9889112Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9889679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9890108Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9890681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9891252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9891844Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9892292Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9892871Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9893314Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9893865Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9894325Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9894898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9895340Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9895892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9896355Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9896793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:50.9897243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:50.9897702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:50.9898159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:50.9898550Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:50.9899028Z test_partial_tensor_reshard (__main__.TestPartialTensorReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54921 2022-12-01T11:05:50.9899579Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54922 2022-12-01T11:05:50.9900086Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 54923 2022-12-01T11:05:50.9900548Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 54924 2022-12-01T11:05:50.9901140Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9901581Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9902153Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9902603Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9903187Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9903632Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9904204Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9904650Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9905224Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9905666Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9906218Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9906677Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9907251Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9907771Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9908325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9908796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9909234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:50.9909707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:50.9910158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:50.9910616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:50.9911008Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:50.9911495Z test_partial_tensor_reshard_errors (__main__.TestPartialTensorReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55057 2022-12-01T11:05:50.9912052Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55058 2022-12-01T11:05:50.9912505Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 55059 2022-12-01T11:05:50.9912948Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 55060 2022-12-01T11:05:50.9913540Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9913986Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9914557Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9915007Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9915587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9916027Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9916597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9917110Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9917707Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9918146Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9918692Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9919158Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9919730Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:05:50.9920170Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:05:50.9920727Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:05:50.9921193Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:05:50.9921626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:05:50.9922097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:05:50.9922956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:05:50.9923507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:05:50.9923903Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:05:50.9924096Z 2022-12-01T11:05:50.9924358Z ---------------------------------------------------------------------- 2022-12-01T11:05:50.9924691Z Ran 5 tests in 11.240s 2022-12-01T11:05:50.9924990Z 2022-12-01T11:05:50.9925101Z OK (skipped=5) 2022-12-01T11:05:50.9925256Z 2022-12-01T11:05:50.9925381Z Generating XML reports... 2022-12-01T11:05:50.9925986Z Generated XML report: test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorOps-20221201110539.xml 2022-12-01T11:05:50.9926796Z Generated XML report: test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorReshard-20221201110539.xml 2022-12-01T11:05:50.9927169Z 2022-12-01T11:05:50.9927494Z ##[endgroup] 2022-12-01T11:05:50.9928083Z FINISHED PRINTING LOG FILE of distributed/_shard/test_partial_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-test_partial_tensor_wabbvc3n) 2022-12-01T11:05:50.9928439Z 2022-12-01T11:05:50.9928732Z Running distributed/elastic/timer/local_timer_example ... [2022-12-01 11:05:50.985613] 2022-12-01T11:05:50.9929448Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/timer/local_timer_example.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:05:50.985943] 2022-12-01T11:06:02.3670999Z 2022-12-01T11:06:02.3671748Z Expand the folded group to see the log file of distributed/elastic/timer/local_timer_example 2022-12-01T11:06:02.3672906Z ##[group]PRINTING LOG FILE of distributed/elastic/timer/local_timer_example (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-local_timer_example_qpzognty) 2022-12-01T11:06:02.3673312Z 2022-12-01T11:06:02.3673424Z Running tests... 2022-12-01T11:06:02.3673947Z ---------------------------------------------------------------------- 2022-12-01T11:06:02.3674525Z Test results will be stored in test-reports/python-unittest/distributed.elastic.timer.local_timer_example 2022-12-01T11:06:02.3675200Z test_example_start_method_spawn (__main__.LocalTimerExample) ... [INFO] 2022-12-01 11:05:53,988 driver: init 2022-12-01T11:06:02.3675753Z [INFO] 2022-12-01 11:05:54,023 api: Starting LocalTimerServer... max_interval=0.01, daemon=True 2022-12-01T11:06:02.3676233Z [INFO] 2022-12-01 11:05:54,023 api: Starting watchdog thread... 2022-12-01T11:06:02.3676771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3677500Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3678742Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3679578Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3680627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3681443Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3682892Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3683805Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3685016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3685907Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3687076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3687990Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3689139Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3690061Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3691212Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3692120Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3693482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3694379Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3695519Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3696432Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3697545Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3698398Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3699553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3700631Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3701803Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3702711Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3703915Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3704841Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3706028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3706938Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3708094Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3709028Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3710015Z [INFO] 2022-12-01 11:05:55,534 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3711018Z [INFO] 2022-12-01 11:05:55,619 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3711953Z [INFO] 2022-12-01 11:05:55,650 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3713061Z [INFO] 2022-12-01 11:05:55,661 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3736075Z [INFO] 2022-12-01 11:05:55,669 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3737213Z [INFO] 2022-12-01 11:05:55,672 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3738188Z [INFO] 2022-12-01 11:05:55,672 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3739230Z [INFO] 2022-12-01 11:05:55,674 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3740385Z [INFO] 2022-12-01 11:05:56,607 api: Reaping worker_id=[55235]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3741392Z [INFO] 2022-12-01 11:05:56,608 api: Successfully reaped worker=[55235] 2022-12-01T11:06:02.3742548Z [INFO] 2022-12-01 11:05:56,699 api: Reaping worker_id=[55231]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3743599Z [INFO] 2022-12-01 11:05:56,700 api: Successfully reaped worker=[55231] 2022-12-01T11:06:02.3744759Z [INFO] 2022-12-01 11:05:56,730 api: Reaping worker_id=[55233]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3745778Z [INFO] 2022-12-01 11:05:56,731 api: Successfully reaped worker=[55233] 2022-12-01T11:06:02.3746923Z [INFO] 2022-12-01 11:05:56,751 api: Reaping worker_id=[55229]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3747954Z [INFO] 2022-12-01 11:05:56,752 api: Successfully reaped worker=[55229] 2022-12-01T11:06:02.3748823Z [INFO] 2022-12-01 11:05:56,761 api: Stopping LocalTimerServer 2022-12-01T11:06:02.3749639Z [INFO] 2022-12-01 11:05:56,761 api: Stopping watchdog thread... 2022-12-01T11:06:02.3750203Z ok (4.297s) 2022-12-01T11:06:02.3751297Z test_torch_mp_example (__main__.LocalTimerExample) ... [INFO] 2022-12-01 11:05:56,764 api: Starting LocalTimerServer... max_interval=0.01, daemon=True 2022-12-01T11:06:02.3752588Z [INFO] 2022-12-01 11:05:56,764 api: Starting watchdog thread... 2022-12-01T11:06:02.3753711Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3754586Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3755657Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3756566Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3757716Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3758597Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3759741Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3760721Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3761909Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3763103Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3764267Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3765206Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3766371Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3767295Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3768453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3769406Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3770817Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3771716Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3772894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3773838Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3774986Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3775858Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3777040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3777973Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3779126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3780022Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3781182Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3782124Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3783302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3784190Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3785416Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3786431Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3787606Z [INFO] 2022-12-01 11:05:58,377 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3788609Z [INFO] 2022-12-01 11:05:58,412 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3789586Z [INFO] 2022-12-01 11:05:58,417 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3790563Z [INFO] 2022-12-01 11:05:58,452 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3791534Z [INFO] 2022-12-01 11:05:58,456 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3792499Z [INFO] 2022-12-01 11:05:58,458 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3793478Z [INFO] 2022-12-01 11:05:58,470 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3794430Z [INFO] 2022-12-01 11:05:58,488 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3795616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3796526Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3797701Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3798656Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3799874Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3800762Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3801899Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3803107Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3804306Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3805187Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3806560Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3807563Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3808754Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3809630Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3810810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3811758Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3812902Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3813803Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3814946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3815880Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3817035Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3817931Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3819110Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3820053Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3821222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3822131Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3823516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3824425Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3825587Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:02.3826485Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:02.3827667Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:02.3828615Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:02.3829615Z [INFO] 2022-12-01 11:06:00,872 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3830588Z [INFO] 2022-12-01 11:06:00,893 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3831547Z [INFO] 2022-12-01 11:06:00,910 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3832550Z [INFO] 2022-12-01 11:06:00,912 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3833517Z [INFO] 2022-12-01 11:06:00,977 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3834488Z [INFO] 2022-12-01 11:06:00,982 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3835439Z [INFO] 2022-12-01 11:06:00,983 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3836393Z [INFO] 2022-12-01 11:06:00,984 api: Timer client configured to: LocalTimerClient 2022-12-01T11:06:02.3837553Z [INFO] 2022-12-01 11:06:01,947 api: Reaping worker_id=[55780]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3838575Z [INFO] 2022-12-01 11:06:01,948 api: Successfully reaped worker=[55780] 2022-12-01T11:06:02.3839766Z [INFO] 2022-12-01 11:06:01,968 api: Reaping worker_id=[55775]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3840815Z [INFO] 2022-12-01 11:06:01,968 api: Successfully reaped worker=[55775] 2022-12-01T11:06:02.3842089Z [INFO] 2022-12-01 11:06:01,988 api: Reaping worker_id=[55774]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3843469Z [INFO] 2022-12-01 11:06:01,988 local_timer: Process with pid=55774 does not exist. Skipping 2022-12-01T11:06:02.3844428Z [INFO] 2022-12-01 11:06:01,989 api: Successfully reaped worker=[55774] 2022-12-01T11:06:02.3845596Z [INFO] 2022-12-01 11:06:01,989 api: Reaping worker_id=[55778]. Expired timers: ['/opt/conda/lib/python3.10/contextlib.py#135'] 2022-12-01T11:06:02.3846601Z [INFO] 2022-12-01 11:06:01,989 api: Successfully reaped worker=[55778] 2022-12-01T11:06:02.3847460Z [INFO] 2022-12-01 11:06:02,018 api: Stopping LocalTimerServer 2022-12-01T11:06:02.3848292Z [INFO] 2022-12-01 11:06:02,018 api: Stopping watchdog thread... 2022-12-01T11:06:02.3848831Z ok (5.257s) 2022-12-01T11:06:02.3849100Z 2022-12-01T11:06:02.3849606Z ---------------------------------------------------------------------- 2022-12-01T11:06:02.3850258Z Ran 2 tests in 9.555s 2022-12-01T11:06:02.3850566Z 2022-12-01T11:06:02.3850734Z OK 2022-12-01T11:06:02.3850982Z 2022-12-01T11:06:02.3851228Z Generating XML reports... 2022-12-01T11:06:02.3852484Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_example/TEST-LocalTimerExample-20221201110552.xml 2022-12-01T11:06:02.3853250Z 2022-12-01T11:06:02.3853875Z ##[endgroup] 2022-12-01T11:06:02.3855218Z FINISHED PRINTING LOG FILE of distributed/elastic/timer/local_timer_example (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-local_timer_example_qpzognty) 2022-12-01T11:06:02.3856025Z 2022-12-01T11:06:02.3856627Z Running distributed/_shard/sharded_tensor/ops/test_linear ... [2022-12-01 11:06:02.367218] 2022-12-01T11:06:02.3858076Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_linear.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:02.367548] 2022-12-01T11:06:11.5159161Z 2022-12-01T11:06:11.5159798Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_linear 2022-12-01T11:06:11.5161194Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_linear (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_linear_rdpq198j) 2022-12-01T11:06:11.5161612Z 2022-12-01T11:06:11.5161789Z Running tests... 2022-12-01T11:06:11.5162287Z ---------------------------------------------------------------------- 2022-12-01T11:06:11.5163163Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear 2022-12-01T11:06:11.5163707Z test_sharded_linear_colwise (__main__.TestShardedTensorOpsLinear) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:06:11.5164193Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56081 2022-12-01T11:06:11.5164647Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56082 2022-12-01T11:06:11.5165105Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56083 2022-12-01T11:06:11.5165546Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56084 2022-12-01T11:06:11.5166169Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5166624Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5167191Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5167667Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5168230Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5168677Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5169255Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5169971Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5170597Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5171044Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5171616Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5172064Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5172640Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5173085Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5173637Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5174111Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5174551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:11.5175030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:11.5175482Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:11.5175941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:11.5176342Z skip: Need at least 4 CUDA devices (3.482s) 2022-12-01T11:06:11.5176826Z test_sharded_linear_errors (__main__.TestShardedTensorOpsLinear) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56217 2022-12-01T11:06:11.5177375Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56218 2022-12-01T11:06:11.5177951Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56219 2022-12-01T11:06:11.5178399Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56220 2022-12-01T11:06:11.5179006Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5179458Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5180037Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5180505Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5181068Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5181511Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5182079Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5182534Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5183114Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5183557Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5184126Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5184577Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5185155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5185594Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5186163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5186615Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5187122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:11.5187614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:11.5188069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:11.5188530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:11.5188920Z skip: Need at least 4 CUDA devices (1.911s) 2022-12-01T11:06:11.5189424Z test_sharded_linear_rowwise (__main__.TestShardedTensorOpsLinear) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56353 2022-12-01T11:06:11.5189961Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56354 2022-12-01T11:06:11.5190407Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56355 2022-12-01T11:06:11.5190850Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56356 2022-12-01T11:06:11.5191454Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5191903Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5192482Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5192951Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5193511Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5193956Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5194525Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5195058Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5195641Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5196090Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5196656Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5197105Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5197681Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:11.5198123Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:11.5198695Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:11.5199142Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:11.5199579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:11.5200055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:11.5200508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:11.5200975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:11.5201361Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:06:11.5201554Z 2022-12-01T11:06:11.5201827Z ---------------------------------------------------------------------- 2022-12-01T11:06:11.5202137Z Ran 3 tests in 7.303s 2022-12-01T11:06:11.5202299Z 2022-12-01T11:06:11.5202590Z OK (skipped=3) 2022-12-01T11:06:11.5202759Z 2022-12-01T11:06:11.5202885Z Generating XML reports... 2022-12-01T11:06:11.5203538Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear/TEST-TestShardedTensorOpsLinear-20221201110603.xml 2022-12-01T11:06:11.5203947Z 2022-12-01T11:06:11.5204271Z ##[endgroup] 2022-12-01T11:06:11.5205103Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_linear (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_linear_rdpq198j) 2022-12-01T11:06:11.5205525Z 2022-12-01T11:06:11.5205854Z Running distributed/_shard/sharded_tensor/ops/test_softmax ... [2022-12-01 11:06:11.515962] 2022-12-01T11:06:11.5206570Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_softmax.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:11.516343] 2022-12-01T11:06:18.7327120Z 2022-12-01T11:06:18.7327910Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_softmax 2022-12-01T11:06:18.7328939Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_softmax (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_softmax_sk8t3008) 2022-12-01T11:06:18.7329663Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt4nn8wgd 2022-12-01T11:06:18.7330215Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt4nn8wgd/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7330559Z 2022-12-01T11:06:18.7330676Z Running tests... 2022-12-01T11:06:18.7331191Z ---------------------------------------------------------------------- 2022-12-01T11:06:18.7331770Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax 2022-12-01T11:06:18.7332302Z test_sharded_softmax_basic (__main__.TestShardedSoftmax) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:06:18.7332786Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56524 2022-12-01T11:06:18.7333225Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56525 2022-12-01T11:06:18.7333943Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56526 2022-12-01T11:06:18.7334376Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56527 2022-12-01T11:06:18.7335012Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7335450Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7336028Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7336498Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7337058Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7337506Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7338128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7338598Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7339162Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7339615Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7340190Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7340651Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7341208Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7341651Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7342215Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7342664Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7343253Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd5srffx0 2022-12-01T11:06:18.7343834Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd5srffx0/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7344347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:18.7344833Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkwxniwka 2022-12-01T11:06:18.7345374Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkwxniwka/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7345883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:18.7346363Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdll0jdo3 2022-12-01T11:06:18.7346903Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdll0jdo3/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7347437Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzuo6hioh 2022-12-01T11:06:18.7347970Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzuo6hioh/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7348458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:18.7348926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:18.7349318Z skip: Need at least 4 CUDA devices (3.439s) 2022-12-01T11:06:18.7349809Z test_sharded_softmax_on_sharding_dim (__main__.TestShardedSoftmax) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56660 2022-12-01T11:06:18.7350331Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56661 2022-12-01T11:06:18.7350779Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56662 2022-12-01T11:06:18.7351296Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56663 2022-12-01T11:06:18.7351903Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7352357Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7352937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7353408Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7353973Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7354424Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7354994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7355446Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7356027Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7356470Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7357040Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7357484Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7358057Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:18.7358495Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:18.7359064Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:18.7359516Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:18.7359982Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi0ebf495 2022-12-01T11:06:18.7360587Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi0ebf495/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7361382Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprf60vncb 2022-12-01T11:06:18.7361918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprf60vncb/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7362659Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnis4xf_v 2022-12-01T11:06:18.7363207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnis4xf_v/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7363701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:18.7364173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:18.7364648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:18.7365130Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph7nx65zn 2022-12-01T11:06:18.7365667Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph7nx65zn/_remote_module_non_scriptable.py 2022-12-01T11:06:18.7366172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:18.7366564Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:06:18.7366759Z 2022-12-01T11:06:18.7367037Z ---------------------------------------------------------------------- 2022-12-01T11:06:18.7367373Z Ran 2 tests in 5.348s 2022-12-01T11:06:18.7367536Z 2022-12-01T11:06:18.7367644Z OK (skipped=2) 2022-12-01T11:06:18.7367797Z 2022-12-01T11:06:18.7367920Z Generating XML reports... 2022-12-01T11:06:18.7368536Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20221201110613.xml 2022-12-01T11:06:18.7369031Z 2022-12-01T11:06:18.7369352Z ##[endgroup] 2022-12-01T11:06:18.7370038Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_softmax (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_softmax_sk8t3008) 2022-12-01T11:06:18.7370428Z 2022-12-01T11:06:18.7370757Z Running distributed/_shard/sharded_tensor/ops/test_embedding_bag ... [2022-12-01 11:06:18.732739] 2022-12-01T11:06:18.7371511Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_embedding_bag.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:18.733098] 2022-12-01T11:06:25.9733640Z 2022-12-01T11:06:25.9734435Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/ops/test_embedding_bag 2022-12-01T11:06:25.9736263Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_embedding_bag (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_embedding_bag_laap3s0v) 2022-12-01T11:06:25.9737035Z 2022-12-01T11:06:25.9737294Z Running tests... 2022-12-01T11:06:25.9737886Z ---------------------------------------------------------------------- 2022-12-01T11:06:25.9738480Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding_bag 2022-12-01T11:06:25.9739055Z test_sharded_embedding_bag_colwise (__main__.TestShardedEmbeddingBag) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:06:25.9739559Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56831 2022-12-01T11:06:25.9740001Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56832 2022-12-01T11:06:25.9740731Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56833 2022-12-01T11:06:25.9741182Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56834 2022-12-01T11:06:25.9741831Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9742563Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9743221Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9743705Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9744315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9744803Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9745432Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9745930Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9746554Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9747009Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9747619Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9748113Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9748700Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9749179Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9749785Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9750280Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9750726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:25.9751356Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:25.9751863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:25.9752348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:25.9752765Z skip: Need at least 4 CUDA devices (3.467s) 2022-12-01T11:06:25.9753305Z test_sharded_embedding_bag_rowwise (__main__.TestShardedEmbeddingBag) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56967 2022-12-01T11:06:25.9753891Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56968 2022-12-01T11:06:25.9754350Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 56969 2022-12-01T11:06:25.9754828Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 56970 2022-12-01T11:06:25.9755481Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9755959Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9756553Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9757055Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9757672Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9758132Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9758740Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9759238Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9759850Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9760310Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9760982Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9761496Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9762093Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:25.9762874Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:25.9763471Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:25.9763943Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:25.9764368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:25.9764847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:25.9765316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:25.9765768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:25.9766163Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:06:25.9766360Z 2022-12-01T11:06:25.9766632Z ---------------------------------------------------------------------- 2022-12-01T11:06:25.9766962Z Ran 2 tests in 5.376s 2022-12-01T11:06:25.9767125Z 2022-12-01T11:06:25.9767216Z OK (skipped=2) 2022-12-01T11:06:25.9767371Z 2022-12-01T11:06:25.9767496Z Generating XML reports... 2022-12-01T11:06:25.9768166Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding_bag/TEST-TestShardedEmbeddingBag-20221201110620.xml 2022-12-01T11:06:25.9768564Z 2022-12-01T11:06:25.9768879Z ##[endgroup] 2022-12-01T11:06:25.9769697Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/ops/test_embedding_bag (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-ops-test_embedding_bag_laap3s0v) 2022-12-01T11:06:25.9770118Z 2022-12-01T11:06:25.9770453Z Running distributed/_shard/sharded_tensor/test_sharded_tensor_reshard ... [2022-12-01 11:06:25.973398] 2022-12-01T11:06:25.9771220Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:25.973749] 2022-12-01T11:06:33.2072052Z 2022-12-01T11:06:33.2073158Z Expand the folded group to see the log file of distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 2022-12-01T11:06:33.2075287Z ##[group]PRINTING LOG FILE of distributed/_shard/sharded_tensor/test_sharded_tensor_reshard (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-test_sharded_tensor_reshard_qccbinq1) 2022-12-01T11:06:33.2076202Z 2022-12-01T11:06:33.2076422Z Running tests... 2022-12-01T11:06:33.2077259Z ---------------------------------------------------------------------- 2022-12-01T11:06:33.2077893Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard 2022-12-01T11:06:33.2078409Z test_sharded_tensor_reshard (__main__.TestReshard) ... INFO:numba.cuda.cudadrv.driver:init 2022-12-01T11:06:33.2078893Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57138 2022-12-01T11:06:33.2079351Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57139 2022-12-01T11:06:33.2079787Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57140 2022-12-01T11:06:33.2080213Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57141 2022-12-01T11:06:33.2080840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2081305Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2082124Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2083004Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2083607Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2084058Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2084622Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2085091Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2085682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2086125Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2086682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2087154Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2087737Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2088160Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2088738Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2089204Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2089641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:33.2090101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:33.2090719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:33.2091192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:33.2091572Z skip: Need at least 4 CUDA devices (3.464s) 2022-12-01T11:06:33.2092052Z test_sharded_tensor_reshard_errors (__main__.TestReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57274 2022-12-01T11:06:33.2092580Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57275 2022-12-01T11:06:33.2093024Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57276 2022-12-01T11:06:33.2093447Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 57277 2022-12-01T11:06:33.2094061Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2094516Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2095098Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2095556Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2096135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2096589Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2097135Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2097602Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2098180Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2098625Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2099186Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2099732Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2100336Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:33.2100759Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:33.2101331Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:33.2101796Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:33.2102230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-12-01T11:06:33.2102698Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-12-01T11:06:33.2103145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-12-01T11:06:33.2103622Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-12-01T11:06:33.2103997Z skip: Need at least 4 CUDA devices (1.909s) 2022-12-01T11:06:33.2104197Z 2022-12-01T11:06:33.2104476Z ---------------------------------------------------------------------- 2022-12-01T11:06:33.2104807Z Ran 2 tests in 5.373s 2022-12-01T11:06:33.2104970Z 2022-12-01T11:06:33.2105080Z OK (skipped=2) 2022-12-01T11:06:33.2105235Z 2022-12-01T11:06:33.2105343Z Generating XML reports... 2022-12-01T11:06:33.2105971Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/TEST-TestReshard-20221201110627.xml 2022-12-01T11:06:33.2106349Z 2022-12-01T11:06:33.2106676Z ##[endgroup] 2022-12-01T11:06:33.2107374Z FINISHED PRINTING LOG FILE of distributed/_shard/sharded_tensor/test_sharded_tensor_reshard (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-sharded_tensor-test_sharded_tensor_reshard_qccbinq1) 2022-12-01T11:06:33.2107892Z 2022-12-01T11:06:33.2108183Z Running distributed/elastic/timer/local_timer_test ... [2022-12-01 11:06:33.207231] 2022-12-01T11:06:33.2108910Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/timer/local_timer_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:33.207562] 2022-12-01T11:06:40.4392472Z 2022-12-01T11:06:40.4393207Z Expand the folded group to see the log file of distributed/elastic/timer/local_timer_test 2022-12-01T11:06:40.4394210Z ##[group]PRINTING LOG FILE of distributed/elastic/timer/local_timer_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-local_timer_test_rbxv20nt) 2022-12-01T11:06:40.4394657Z 2022-12-01T11:06:40.4394779Z Running tests... 2022-12-01T11:06:40.4395276Z ---------------------------------------------------------------------- 2022-12-01T11:06:40.4395864Z Test results will be stored in test-reports/python-unittest/distributed.elastic.timer.local_timer_test 2022-12-01T11:06:40.4396365Z test_acquire_release (__main__.LocalTimerServerTest) 2022-12-01T11:06:40.4397366Z tests that: ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/87154 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.553s) 2022-12-01T11:06:40.4398067Z test_expired_timers (__main__.LocalTimerServerTest) 2022-12-01T11:06:40.4398479Z tests that a single expired timer on a process should terminate ... ok (0.004s) 2022-12-01T11:06:40.4398875Z test_valid_timers (__main__.LocalTimerServerTest) 2022-12-01T11:06:40.4399272Z tests that valid timers are processed correctly and the process is left alone ... ok (0.003s) 2022-12-01T11:06:40.4399694Z test_watchdog_call_count (__main__.LocalTimerServerTest) 2022-12-01T11:06:40.4400178Z checks that the watchdog function ran wait/interval +- 1 times ... ok (0.104s) 2022-12-01T11:06:40.4400592Z test_watchdog_empty_queue (__main__.LocalTimerServerTest) 2022-12-01T11:06:40.4401220Z checks that the watchdog can run on an empty queue ... ok (0.011s) 2022-12-01T11:06:40.4401678Z test_client_interaction (__main__.LocalTimerTest) ... ok (0.004s) 2022-12-01T11:06:40.4402099Z test_exception_propagation (__main__.LocalTimerTest) ... ok (0.011s) 2022-12-01T11:06:40.4402935Z test_get_timer_recursive (__main__.LocalTimerTest) 2022-12-01T11:06:40.4403688Z If a function acquires a countdown timer with default scope, ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:40.4404216Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:40.4404796Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:40.4405252Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:40.4405576Z ok (1.911s) 2022-12-01T11:06:40.4405880Z test_happy_path (__main__.LocalTimerTest) ... ok (0.103s) 2022-12-01T11:06:40.4406225Z test_no_client (__main__.LocalTimerTest) ... ok (0.011s) 2022-12-01T11:06:40.4406573Z test_timer (__main__.LocalTimerTest) ... ok (0.154s) 2022-12-01T11:06:40.4406986Z test_get (__main__.MultiprocessingRequestQueueTest) ... ok (0.023s) 2022-12-01T11:06:40.4407428Z test_get_less_than_size (__main__.MultiprocessingRequestQueueTest) 2022-12-01T11:06:40.4407773Z Tests slow producer. ... ok (0.515s) 2022-12-01T11:06:40.4408141Z test_get_size (__main__.MultiprocessingRequestQueueTest) 2022-12-01T11:06:40.4408534Z Creates a "producer" process that enqueues ``n`` elements ... ok (0.920s) 2022-12-01T11:06:40.4408772Z 2022-12-01T11:06:40.4409032Z ---------------------------------------------------------------------- 2022-12-01T11:06:40.4409372Z Ran 14 tests in 5.331s 2022-12-01T11:06:40.4409695Z 2022-12-01T11:06:40.4409808Z OK (skipped=1) 2022-12-01T11:06:40.4409963Z 2022-12-01T11:06:40.4410091Z Generating XML reports... 2022-12-01T11:06:40.4410729Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20221201110634.xml 2022-12-01T11:06:40.4411553Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20221201110634.xml 2022-12-01T11:06:40.4412410Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20221201110634.xml 2022-12-01T11:06:40.4412831Z 2022-12-01T11:06:40.4413141Z ##[endgroup] 2022-12-01T11:06:40.4413790Z FINISHED PRINTING LOG FILE of distributed/elastic/timer/local_timer_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-local_timer_test_rbxv20nt) 2022-12-01T11:06:40.4414181Z 2022-12-01T11:06:40.4414475Z Running distributed/elastic/utils/distributed_test ... [2022-12-01 11:06:40.439247] 2022-12-01T11:06:40.4415196Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/utils/distributed_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:40.439576] 2022-12-01T11:06:46.9797095Z 2022-12-01T11:06:46.9797594Z Expand the folded group to see the log file of distributed/elastic/utils/distributed_test 2022-12-01T11:06:46.9799189Z ##[group]PRINTING LOG FILE of distributed/elastic/utils/distributed_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-distributed_test_be7nnr5e) 2022-12-01T11:06:46.9799635Z 2022-12-01T11:06:46.9799753Z Running tests... 2022-12-01T11:06:46.9800262Z ---------------------------------------------------------------------- 2022-12-01T11:06:46.9800866Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.distributed_test 2022-12-01T11:06:46.9801365Z test_create_store_multi (__main__.DistributedUtilTest) ... ok (1.622s) 2022-12-01T11:06:46.9801789Z test_create_store_no_port_multi (__main__.DistributedUtilTest) ... ok (0.001s) 2022-12-01T11:06:46.9803488Z test_create_store_single_server (__main__.DistributedUtilTest) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/66207 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.000s) 2022-12-01T11:06:46.9804336Z test_create_store_timeout_on_server (__main__.DistributedUtilTest) ... ok (3.026s) 2022-12-01T11:06:46.9804902Z test_create_store_timeout_on_worker (__main__.DistributedUtilTest) ... [E socket.cpp:860] [c10d] The client socket has timed out after 1s while trying to connect to (66307f3ad701, 0). 2022-12-01T11:06:46.9805341Z ok (0.001s) 2022-12-01T11:06:46.9805981Z test_port_already_in_use_on_server (__main__.DistributedUtilTest) ... [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:39521 (errno: 98 - Address already in use). 2022-12-01T11:06:46.9806673Z [W socket.cpp:426] [c10d] The server socket has failed to bind to 0.0.0.0:39521 (errno: 98 - Address already in use). 2022-12-01T11:06:46.9807156Z [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. 2022-12-01T11:06:46.9807493Z ok (0.004s) 2022-12-01T11:06:46.9807953Z test_port_already_in_use_on_worker (__main__.DistributedUtilTest) ... [E socket.cpp:860] [c10d] The client socket has timed out after 1s while trying to connect to (66307f3ad701, 56765). 2022-12-01T11:06:46.9808393Z ok (0.003s) 2022-12-01T11:06:46.9808543Z 2022-12-01T11:06:46.9808816Z ---------------------------------------------------------------------- 2022-12-01T11:06:46.9809151Z Ran 7 tests in 4.659s 2022-12-01T11:06:46.9809315Z 2022-12-01T11:06:46.9809406Z OK (skipped=1) 2022-12-01T11:06:46.9809560Z 2022-12-01T11:06:46.9809687Z Generating XML reports... 2022-12-01T11:06:46.9810336Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.distributed_test/TEST-DistributedUtilTest-20221201110641.xml 2022-12-01T11:06:46.9810860Z 2022-12-01T11:06:46.9811168Z ##[endgroup] 2022-12-01T11:06:46.9811844Z FINISHED PRINTING LOG FILE of distributed/elastic/utils/distributed_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-distributed_test_be7nnr5e) 2022-12-01T11:06:46.9812245Z 2022-12-01T11:06:46.9812516Z Running distributed/rpc/test_share_memory ... [2022-12-01 11:06:46.979721] 2022-12-01T11:06:46.9813231Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/rpc/test_share_memory.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:46.980072] 2022-12-01T11:06:54.2952378Z 2022-12-01T11:06:54.2953137Z Expand the folded group to see the log file of distributed/rpc/test_share_memory 2022-12-01T11:06:54.2954117Z ##[group]PRINTING LOG FILE of distributed/rpc/test_share_memory (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_share_memory_i5wrljp3) 2022-12-01T11:06:54.2954965Z 2022-12-01T11:06:54.2955353Z ]> 2022-12-01T11:06:54.2955743Z test_case (__main__.TestRPCPickler) 2022-12-01T11:06:54.2956431Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:54.2956866Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:54.2957450Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:54.2958097Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:54.2958494Z 2022-12-01T11:06:54.2958623Z Running tests... 2022-12-01T11:06:54.2959013Z ---------------------------------------------------------------------- 2022-12-01T11:06:54.2959570Z Test results will be stored in test-reports/python-unittest/distributed.rpc.test_share_memory 2022-12-01T11:06:54.2960297Z test_case (__main__.TestRPCPickler) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 62 slow tests 2022-12-01T11:06:54.2961067Z warnings.warn(f"loaded {len(slow_tests_dict)} slow tests") 2022-12-01T11:06:54.2961671Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 421 disabled tests 2022-12-01T11:06:54.2962150Z warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests") 2022-12-01T11:06:54.2963124Z ok (3.852s) 2022-12-01T11:06:54.2963285Z 2022-12-01T11:06:54.2963548Z ---------------------------------------------------------------------- 2022-12-01T11:06:54.2963876Z Ran 1 test in 3.853s 2022-12-01T11:06:54.2964037Z 2022-12-01T11:06:54.2964132Z OK 2022-12-01T11:06:54.2964266Z 2022-12-01T11:06:54.2964392Z Generating XML reports... 2022-12-01T11:06:54.2964964Z Generated XML report: test-reports/python-unittest/distributed.rpc.test_share_memory/TEST-TestRPCPickler-20221201110649.xml 2022-12-01T11:06:54.2965320Z 2022-12-01T11:06:54.2965646Z ##[endgroup] 2022-12-01T11:06:54.2966251Z FINISHED PRINTING LOG FILE of distributed/rpc/test_share_memory (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_share_memory_i5wrljp3) 2022-12-01T11:06:54.2966589Z 2022-12-01T11:06:54.2966860Z Running distributed/elastic/utils/util_test ... [2022-12-01 11:06:54.295278] 2022-12-01T11:06:54.2967559Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/utils/util_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:54.295671] 2022-12-01T11:06:57.7513806Z 2022-12-01T11:06:57.7514288Z Expand the folded group to see the log file of distributed/elastic/utils/util_test 2022-12-01T11:06:57.7515379Z ##[group]PRINTING LOG FILE of distributed/elastic/utils/util_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-util_test_29vdecgb) 2022-12-01T11:06:57.7516047Z 2022-12-01T11:06:57.7516459Z Running tests... 2022-12-01T11:06:57.7516961Z ---------------------------------------------------------------------- 2022-12-01T11:06:57.7517547Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.util_test 2022-12-01T11:06:57.7518034Z test_get_all_rank_0 (__main__.StoreUtilTest) ... ok (1.541s) 2022-12-01T11:06:57.7518716Z test_get_all_rank_n (__main__.StoreUtilTest) ... ok (0.002s) 2022-12-01T11:06:57.7519080Z test_synchronize (__main__.StoreUtilTest) ... ok (0.003s) 2022-12-01T11:06:57.7519425Z test_get_logger (__main__.UtilTest) ... ok (0.083s) 2022-12-01T11:06:57.7519778Z test_get_logger_custom_name (__main__.UtilTest) ... ok (0.001s) 2022-12-01T11:06:57.7520127Z test_get_logger_different (__main__.UtilTest) ... ok (0.001s) 2022-12-01T11:06:57.7520482Z test_get_logger_none (__main__.UtilTest) ... ok (0.001s) 2022-12-01T11:06:57.7520688Z 2022-12-01T11:06:57.7520968Z ---------------------------------------------------------------------- 2022-12-01T11:06:57.7521289Z Ran 7 tests in 1.632s 2022-12-01T11:06:57.7521452Z 2022-12-01T11:06:57.7521551Z OK 2022-12-01T11:06:57.7521682Z 2022-12-01T11:06:57.7521809Z Generating XML reports... 2022-12-01T11:06:57.7522759Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-StoreUtilTest-20221201110655.xml 2022-12-01T11:06:57.7523516Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-UtilTest-20221201110655.xml 2022-12-01T11:06:57.7523861Z 2022-12-01T11:06:57.7524170Z ##[endgroup] 2022-12-01T11:06:57.7524786Z FINISHED PRINTING LOG FILE of distributed/elastic/utils/util_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-utils-util_test_29vdecgb) 2022-12-01T11:06:57.7525153Z 2022-12-01T11:06:57.7525409Z Running distributed/nn/jit/test_instantiator ... [2022-12-01 11:06:57.751391] 2022-12-01T11:06:57.7526107Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/nn/jit/test_instantiator.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:06:57.751658] 2022-12-01T11:07:01.1322968Z 2022-12-01T11:07:01.1323729Z Expand the folded group to see the log file of distributed/nn/jit/test_instantiator 2022-12-01T11:07:01.1324944Z ##[group]PRINTING LOG FILE of distributed/nn/jit/test_instantiator (/var/lib/jenkins/workspace/test/test-reports/distributed-nn-jit-test_instantiator_atghh_y3) 2022-12-01T11:07:01.1325363Z 2022-12-01T11:07:01.1325455Z Running tests... 2022-12-01T11:07:01.1325988Z ---------------------------------------------------------------------- 2022-12-01T11:07:01.1326556Z Test results will be stored in test-reports/python-unittest/distributed.nn.jit.test_instantiator 2022-12-01T11:07:01.1327048Z test_get_arg_return_types_from_interface (__main__.TestInstantiator) ... ok (1.557s) 2022-12-01T11:07:01.1327712Z test_instantiate_non_scripted_remote_module_template (__main__.TestInstantiator) ... ok (0.002s) 2022-12-01T11:07:01.1328727Z test_instantiate_scripted_remote_module_template (__main__.TestInstantiator) ... ok (0.014s) 2022-12-01T11:07:01.1329212Z 2022-12-01T11:07:01.1329705Z ---------------------------------------------------------------------- 2022-12-01T11:07:01.1330269Z Ran 3 tests in 1.574s 2022-12-01T11:07:01.1330534Z 2022-12-01T11:07:01.1330689Z OK 2022-12-01T11:07:01.1330914Z 2022-12-01T11:07:01.1331131Z Generating XML reports... 2022-12-01T11:07:01.1332322Z Generated XML report: test-reports/python-unittest/distributed.nn.jit.test_instantiator/TEST-TestInstantiator-20221201110659.xml 2022-12-01T11:07:01.1333022Z 2022-12-01T11:07:01.1333599Z ##[endgroup] 2022-12-01T11:07:01.1334883Z FINISHED PRINTING LOG FILE of distributed/nn/jit/test_instantiator (/var/lib/jenkins/workspace/test/test-reports/distributed-nn-jit-test_instantiator_atghh_y3) 2022-12-01T11:07:01.1335638Z 2022-12-01T11:07:01.1336164Z Running distributed/test_launcher ... [2022-12-01 11:07:01.132290] 2022-12-01T11:07:01.1337445Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/test_launcher.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:07:01.132548] 2022-12-01T11:07:04.8499261Z 2022-12-01T11:07:04.8499704Z Expand the folded group to see the log file of distributed/test_launcher 2022-12-01T11:07:04.8500764Z ##[group]PRINTING LOG FILE of distributed/test_launcher (/var/lib/jenkins/workspace/test/test-reports/distributed-test_launcher_2q9qmfa4) 2022-12-01T11:07:04.8501415Z 2022-12-01T11:07:04.8501617Z Running tests... 2022-12-01T11:07:04.8502138Z ---------------------------------------------------------------------- 2022-12-01T11:07:04.8502680Z Test results will be stored in test-reports/python-unittest/distributed.test_launcher 2022-12-01T11:07:04.8504141Z test_launch_user_script (__main__.TestDistributedLaunch) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/79488 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (1.527s) 2022-12-01T11:07:04.8504773Z 2022-12-01T11:07:04.8505060Z ---------------------------------------------------------------------- 2022-12-01T11:07:04.8505402Z Ran 1 test in 1.528s 2022-12-01T11:07:04.8505560Z 2022-12-01T11:07:04.8505651Z OK (skipped=1) 2022-12-01T11:07:04.8505805Z 2022-12-01T11:07:04.8505930Z Generating XML reports... 2022-12-01T11:07:04.8506523Z Generated XML report: test-reports/python-unittest/distributed.test_launcher/TEST-TestDistributedLaunch-20221201110702.xml 2022-12-01T11:07:04.8506877Z 2022-12-01T11:07:04.8507169Z ##[endgroup] 2022-12-01T11:07:04.8507733Z FINISHED PRINTING LOG FILE of distributed/test_launcher (/var/lib/jenkins/workspace/test/test-reports/distributed-test_launcher_2q9qmfa4) 2022-12-01T11:07:04.8508064Z 2022-12-01T11:07:04.8508354Z Running distributed/_shard/test_replicated_tensor ... [2022-12-01 11:07:04.849929] 2022-12-01T11:07:04.8509069Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/_shard/test_replicated_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:07:04.850218] 2022-12-01T11:07:06.5855208Z 2022-12-01T11:07:06.5856006Z Expand the folded group to see the log file of distributed/_shard/test_replicated_tensor 2022-12-01T11:07:06.5857376Z ##[group]PRINTING LOG FILE of distributed/_shard/test_replicated_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-test_replicated_tensor_qkqnpg6s) 2022-12-01T11:07:06.5857785Z 2022-12-01T11:07:06.5858071Z ##[endgroup] 2022-12-01T11:07:06.5858953Z FINISHED PRINTING LOG FILE of distributed/_shard/test_replicated_tensor (/var/lib/jenkins/workspace/test/test-reports/distributed-_shard-test_replicated_tensor_qkqnpg6s) 2022-12-01T11:07:06.5859554Z 2022-12-01T11:07:06.5860032Z Running distributed/elastic/timer/api_test ... [2022-12-01 11:07:06.585539] 2022-12-01T11:07:06.5861395Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/elastic/timer/api_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:07:06.585823] 2022-12-01T11:07:08.0277667Z 2022-12-01T11:07:08.0278186Z Expand the folded group to see the log file of distributed/elastic/timer/api_test 2022-12-01T11:07:08.0279214Z ##[group]PRINTING LOG FILE of distributed/elastic/timer/api_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-api_test_htuaefbe) 2022-12-01T11:07:08.0279609Z 2022-12-01T11:07:08.0279887Z ##[endgroup] 2022-12-01T11:07:08.0280623Z FINISHED PRINTING LOG FILE of distributed/elastic/timer/api_test (/var/lib/jenkins/workspace/test/test-reports/distributed-elastic-timer-api_test_htuaefbe) 2022-12-01T11:07:08.0280988Z 2022-12-01T11:07:08.0281280Z Running distributed/fsdp/test_shard_utils ... [2022-12-01 11:07:08.027799] 2022-12-01T11:07:08.0284481Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/fsdp/test_shard_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:07:08.028076] 2022-12-01T11:07:09.7353301Z 2022-12-01T11:07:09.7354063Z Expand the folded group to see the log file of distributed/fsdp/test_shard_utils 2022-12-01T11:07:09.7355021Z ##[group]PRINTING LOG FILE of distributed/fsdp/test_shard_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_shard_utils_bic9oqs7) 2022-12-01T11:07:09.7355398Z 2022-12-01T11:07:09.7355694Z ##[endgroup] 2022-12-01T11:07:09.7356409Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_shard_utils (/var/lib/jenkins/workspace/test/test-reports/distributed-fsdp-test_shard_utils_bic9oqs7) 2022-12-01T11:07:09.7356748Z 2022-12-01T11:07:09.7357087Z Running distributed/pipeline/sync/skip/test_inspect_skip_layout ... [2022-12-01 11:07:09.735389] 2022-12-01T11:07:09.7359549Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_inspect_skip_layout.py', '-v'] ... [2022-12-01 11:07:09.735664] 2022-12-01T11:07:12.2119065Z 2022-12-01T11:07:12.2119619Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_inspect_skip_layout 2022-12-01T11:07:12.2120732Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_inspect_skip_layout (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_inspect_skip_layout__kkvmzta) 2022-12-01T11:07:12.2121355Z ============================= test session starts ============================== 2022-12-01T11:07:12.2121956Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:12.2122326Z cachedir: .pytest_cache 2022-12-01T11:07:12.2123558Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:12.2123988Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:12.2124312Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:12.2124872Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:12.2125245Z collecting ... collected 6 items 2022-12-01T11:07:12.2126541Z Running 6 items in this shard: test/distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_no_skippables, test/distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_inner_partition, test/distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_adjoining_partitions, test/distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_far_partitions, test/distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_pop_2_from_different_partitions, test/distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_namespace 2022-12-01T11:07:12.2127431Z 2022-12-01T11:07:12.2127677Z distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_no_skippables PASSED [ 16%] 2022-12-01T11:07:12.2128168Z distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_inner_partition PASSED [ 33%] 2022-12-01T11:07:12.2128669Z distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_adjoining_partitions PASSED [ 50%] 2022-12-01T11:07:12.2129191Z distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_far_partitions PASSED [ 66%] 2022-12-01T11:07:12.2129696Z distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_pop_2_from_different_partitions PASSED [ 83%] 2022-12-01T11:07:12.2130202Z distributed/pipeline/sync/skip/test_inspect_skip_layout.py::test_namespace PASSED [100%] 2022-12-01T11:07:12.2130470Z 2022-12-01T11:07:12.2130627Z ============================== 6 passed in 0.05s =============================== 2022-12-01T11:07:12.2130819Z 2022-12-01T11:07:12.2131116Z ##[endgroup] 2022-12-01T11:07:12.2131844Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_inspect_skip_layout (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_inspect_skip_layout__kkvmzta) 2022-12-01T11:07:12.2132279Z 2022-12-01T11:07:12.2132577Z Running distributed/pipeline/sync/skip/test_stash_pop ... [2022-12-01 11:07:12.211948] 2022-12-01T11:07:12.2133209Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_stash_pop.py', '-v'] ... [2022-12-01 11:07:12.212229] 2022-12-01T11:07:14.2080407Z 2022-12-01T11:07:14.2080982Z Expand the folded group to see the log file of distributed/pipeline/sync/skip/test_stash_pop 2022-12-01T11:07:14.2083112Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/skip/test_stash_pop (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_stash_pop_jv9vm9tc) 2022-12-01T11:07:14.2083709Z ============================= test session starts ============================== 2022-12-01T11:07:14.2084310Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:14.2084673Z cachedir: .pytest_cache 2022-12-01T11:07:14.2085226Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:14.2085663Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:14.2085989Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:14.2086565Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:14.2086941Z collecting ... collected 7 items 2022-12-01T11:07:14.2087932Z Running 7 items in this shard: test/distributed/pipeline/sync/skip/test_stash_pop.py::test_stash, test/distributed/pipeline/sync/skip/test_stash_pop.py::test_pop, test/distributed/pipeline/sync/skip/test_stash_pop.py::test_declare_but_not_use, test/distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_not_declared, test/distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_declared, test/distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_stashed, test/distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_none 2022-12-01T11:07:14.2088755Z 2022-12-01T11:07:14.2088974Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash PASSED [ 14%] 2022-12-01T11:07:14.2089428Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop PASSED [ 28%] 2022-12-01T11:07:14.2089886Z distributed/pipeline/sync/skip/test_stash_pop.py::test_declare_but_not_use PASSED [ 42%] 2022-12-01T11:07:14.2090351Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_not_declared PASSED [ 57%] 2022-12-01T11:07:14.2091048Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_declared PASSED [ 71%] 2022-12-01T11:07:14.2091539Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_stashed PASSED [ 85%] 2022-12-01T11:07:14.2091988Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_none PASSED [100%] 2022-12-01T11:07:14.2092238Z 2022-12-01T11:07:14.2092378Z ============================== 7 passed in 0.05s =============================== 2022-12-01T11:07:14.2092573Z 2022-12-01T11:07:14.2092891Z ##[endgroup] 2022-12-01T11:07:14.2093575Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/skip/test_stash_pop (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-skip-test_stash_pop_jv9vm9tc) 2022-12-01T11:07:14.2093989Z 2022-12-01T11:07:14.2094258Z Running distributed/pipeline/sync/test_balance ... [2022-12-01 11:07:14.208083] 2022-12-01T11:07:14.2094873Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_balance.py', '-v'] ... [2022-12-01 11:07:14.208441] 2022-12-01T11:07:23.1310250Z 2022-12-01T11:07:23.1310999Z Expand the folded group to see the log file of distributed/pipeline/sync/test_balance 2022-12-01T11:07:23.1312217Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_balance (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_balance_lyjbhuhu) 2022-12-01T11:07:23.1312790Z ============================= test session starts ============================== 2022-12-01T11:07:23.1313407Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:23.1313753Z cachedir: .pytest_cache 2022-12-01T11:07:23.1314567Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:23.1315352Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:23.1316042Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:23.1317138Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:23.1317875Z collecting ... collected 18 items 2022-12-01T11:07:23.1322091Z Running 18 items in this shard: test/distributed/pipeline/sync/test_balance.py::test_blockpartition, test/distributed/pipeline/sync/test_balance.py::test_blockpartition_zeros, test/distributed/pipeline/sync/test_balance.py::test_blockpartition_non_positive_partitions, test/distributed/pipeline/sync/test_balance.py::test_blockpartition_short_sequence, test/distributed/pipeline/sync/test_balance.py::test_balance_by_time[cpu], test/distributed/pipeline/sync/test_balance.py::test_balance_by_time[cuda], test/distributed/pipeline/sync/test_balance.py::test_balance_by_time_loop_resets_input, test/distributed/pipeline/sync/test_balance.py::test_balance_by_size_latent, test/distributed/pipeline/sync/test_balance.py::test_balance_by_size_param, test/distributed/pipeline/sync/test_balance.py::test_balance_by_size_param_scale, test/distributed/pipeline/sync/test_balance.py::test_layerwise_sandbox[cpu], test/distributed/pipeline/sync/test_balance.py::test_layerwise_sandbox[cuda], test/distributed/pipeline/sync/test_balance.py::test_sandbox_during_profiling[cpu], test/distributed/pipeline/sync/test_balance.py::test_sandbox_during_profiling[cuda], test/distributed/pipeline/sync/test_balance.py::test_not_training, test/distributed/pipeline/sync/test_balance.py::test_balance_by_time_tuple, test/distributed/pipeline/sync/test_balance.py::test_balance_by_size_tuple, test/distributed/pipeline/sync/test_balance.py::test_already_has_grad 2022-12-01T11:07:23.1326660Z 2022-12-01T11:07:23.1327141Z distributed/pipeline/sync/test_balance.py::test_blockpartition PASSED [ 5%] 2022-12-01T11:07:23.1328070Z distributed/pipeline/sync/test_balance.py::test_blockpartition_zeros PASSED [ 11%] 2022-12-01T11:07:23.1329123Z distributed/pipeline/sync/test_balance.py::test_blockpartition_non_positive_partitions PASSED [ 16%] 2022-12-01T11:07:23.1330305Z distributed/pipeline/sync/test_balance.py::test_blockpartition_short_sequence PASSED [ 22%] 2022-12-01T11:07:23.1331279Z distributed/pipeline/sync/test_balance.py::test_balance_by_time[cpu] SKIPPED [ 27%] 2022-12-01T11:07:23.1332154Z distributed/pipeline/sync/test_balance.py::test_balance_by_time[cuda] SKIPPED [ 33%] 2022-12-01T11:07:23.1333131Z distributed/pipeline/sync/test_balance.py::test_balance_by_time_loop_resets_input PASSED [ 38%] 2022-12-01T11:07:23.1334056Z distributed/pipeline/sync/test_balance.py::test_balance_by_size_latent PASSED [ 44%] 2022-12-01T11:07:23.1334907Z distributed/pipeline/sync/test_balance.py::test_balance_by_size_param PASSED [ 50%] 2022-12-01T11:07:23.1335793Z distributed/pipeline/sync/test_balance.py::test_balance_by_size_param_scale PASSED [ 55%] 2022-12-01T11:07:23.1336717Z distributed/pipeline/sync/test_balance.py::test_layerwise_sandbox[cpu] PASSED [ 61%] 2022-12-01T11:07:23.1337646Z distributed/pipeline/sync/test_balance.py::test_layerwise_sandbox[cuda] PASSED [ 66%] 2022-12-01T11:07:23.1338585Z distributed/pipeline/sync/test_balance.py::test_sandbox_during_profiling[cpu] PASSED [ 72%] 2022-12-01T11:07:23.1339556Z distributed/pipeline/sync/test_balance.py::test_sandbox_during_profiling[cuda] PASSED [ 77%] 2022-12-01T11:07:23.1340489Z distributed/pipeline/sync/test_balance.py::test_not_training PASSED [ 83%] 2022-12-01T11:07:23.1341401Z distributed/pipeline/sync/test_balance.py::test_balance_by_time_tuple PASSED [ 88%] 2022-12-01T11:07:23.1342275Z distributed/pipeline/sync/test_balance.py::test_balance_by_size_tuple PASSED [ 94%] 2022-12-01T11:07:23.1343177Z distributed/pipeline/sync/test_balance.py::test_already_has_grad PASSED [100%] 2022-12-01T11:07:23.1343673Z 2022-12-01T11:07:23.1343986Z ======================== 16 passed, 2 skipped in 6.84s ========================= 2022-12-01T11:07:23.1344567Z 2022-12-01T11:07:23.1345154Z ##[endgroup] 2022-12-01T11:07:23.1346483Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_balance (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_balance_lyjbhuhu) 2022-12-01T11:07:23.1347271Z 2022-12-01T11:07:23.1347832Z Running distributed/pipeline/sync/test_copy ... [2022-12-01 11:07:23.131063] 2022-12-01T11:07:23.1349054Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_copy.py', '-v'] ... [2022-12-01 11:07:23.131364] 2022-12-01T11:07:27.2261644Z 2022-12-01T11:07:27.2262366Z Expand the folded group to see the log file of distributed/pipeline/sync/test_copy 2022-12-01T11:07:27.2264337Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_copy (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_copy_eghpeu18) 2022-12-01T11:07:27.2265149Z ============================= test session starts ============================== 2022-12-01T11:07:27.2265778Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:27.2266137Z cachedir: .pytest_cache 2022-12-01T11:07:27.2266723Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:27.2267160Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:27.2267472Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:27.2268476Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:27.2269123Z collecting ... collected 5 items 2022-12-01T11:07:27.2270443Z Running 5 items in this shard: test/distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cpu, test/distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cuda, test/distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cpu, test/distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cuda, test/distributed/pipeline/sync/test_copy.py::test_wait_multiple_tensors 2022-12-01T11:07:27.2271508Z 2022-12-01T11:07:27.2271838Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cpu PASSED [ 20%] 2022-12-01T11:07:27.2272800Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cuda PASSED [ 40%] 2022-12-01T11:07:27.2273493Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cpu PASSED [ 60%] 2022-12-01T11:07:27.2274164Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cuda PASSED [ 80%] 2022-12-01T11:07:27.2274860Z distributed/pipeline/sync/test_copy.py::test_wait_multiple_tensors PASSED [100%] 2022-12-01T11:07:27.2275320Z 2022-12-01T11:07:27.2275569Z ============================== 5 passed in 2.10s =============================== 2022-12-01T11:07:27.2275858Z 2022-12-01T11:07:27.2276367Z ##[endgroup] 2022-12-01T11:07:27.2277467Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_copy (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_copy_eghpeu18) 2022-12-01T11:07:27.2278142Z 2022-12-01T11:07:27.2278645Z Running distributed/pipeline/sync/test_inplace ... [2022-12-01 11:07:27.226188] 2022-12-01T11:07:27.2279319Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_inplace.py', '-v'] ... [2022-12-01 11:07:27.226493] 2022-12-01T11:07:29.3496703Z 2022-12-01T11:07:29.3497443Z Expand the folded group to see the log file of distributed/pipeline/sync/test_inplace 2022-12-01T11:07:29.3498470Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_inplace (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_inplace_5isheyu6) 2022-12-01T11:07:29.3499051Z ============================= test session starts ============================== 2022-12-01T11:07:29.3499650Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:29.3500013Z cachedir: .pytest_cache 2022-12-01T11:07:29.3500588Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:29.3501330Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:29.3501657Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:29.3502231Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:29.3502605Z collecting ... collected 3 items 2022-12-01T11:07:29.3503248Z Running 3 items in this shard: test/distributed/pipeline/sync/test_inplace.py::test_inplace_on_requires_grad, test/distributed/pipeline/sync/test_inplace.py::test_inplace_on_not_requires_grad, test/distributed/pipeline/sync/test_inplace.py::test_inplace_incorrect_grad 2022-12-01T11:07:29.3503748Z 2022-12-01T11:07:29.3503978Z distributed/pipeline/sync/test_inplace.py::test_inplace_on_requires_grad PASSED [ 33%] 2022-12-01T11:07:29.3504453Z distributed/pipeline/sync/test_inplace.py::test_inplace_on_not_requires_grad XFAIL [ 66%] 2022-12-01T11:07:29.3504899Z distributed/pipeline/sync/test_inplace.py::test_inplace_incorrect_grad XFAIL [100%] 2022-12-01T11:07:29.3505162Z 2022-12-01T11:07:29.3505329Z ========================= 1 passed, 2 xfailed in 0.16s ========================= 2022-12-01T11:07:29.3505531Z 2022-12-01T11:07:29.3505842Z ##[endgroup] 2022-12-01T11:07:29.3506498Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_inplace (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_inplace_5isheyu6) 2022-12-01T11:07:29.3506864Z 2022-12-01T11:07:29.3507153Z Running distributed/pipeline/sync/test_pipe ... [2022-12-01 11:07:29.349722] 2022-12-01T11:07:29.3507751Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_pipe.py', '-v'] ... [2022-12-01 11:07:29.350066] 2022-12-01T11:07:36.3080319Z 2022-12-01T11:07:36.3080874Z Expand the folded group to see the log file of distributed/pipeline/sync/test_pipe 2022-12-01T11:07:36.3081908Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_pipe (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_pipe_j3vhijrr) 2022-12-01T11:07:36.3082788Z ============================= test session starts ============================== 2022-12-01T11:07:36.3083685Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:36.3084092Z cachedir: .pytest_cache 2022-12-01T11:07:36.3084691Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:36.3085143Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:36.3085489Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:36.3086063Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:36.3086470Z collecting ... collected 56 items 2022-12-01T11:07:36.3093140Z Running 56 items in this shard: test/distributed/pipeline/sync/test_pipe.py::test_pipe_without_rpc, test/distributed/pipeline/sync/test_pipe.py::test_parameters, test/distributed/pipeline/sync/test_pipe.py::test_public_attrs, test/distributed/pipeline/sync/test_pipe.py::test_sequential_like, test/distributed/pipeline/sync/test_pipe.py::test_chunks_less_than_1, test/distributed/pipeline/sync/test_pipe.py::test_batch_size_indivisible, test/distributed/pipeline/sync/test_pipe.py::test_batch_size_small, test/distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode, test/distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_invalid, test/distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_when_chunks_1, test/distributed/pipeline/sync/test_pipe.py::test_checkpoint_eval, test/distributed/pipeline/sync/test_pipe.py::test_checkpoint_non_float_input, test/distributed/pipeline/sync/test_pipe.py::test_no_grad, test/distributed/pipeline/sync/test_pipe.py::test_exception, test/distributed/pipeline/sync/test_pipe.py::test_exception_early_stop_asap, test/distributed/pipeline/sync/test_pipe.py::test_nested_input, test/distributed/pipeline/sync/test_pipe.py::test_input_pair, test/distributed/pipeline/sync/test_pipe.py::test_multi_sequence_input, test/distributed/pipeline/sync/test_pipe.py::test_input_singleton, test/distributed/pipeline/sync/test_pipe.py::test_input_varargs, test/distributed/pipeline/sync/test_pipe.py::test_non_tensor, test/distributed/pipeline/sync/test_pipe.py::test_non_tensor_sequence, test/distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[never], test/distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[always], test/distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[except_last], test/distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[never], test/distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[always], test/distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[except_last], test/distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[never], test/distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[always], test/distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[except_last], test/distributed/pipeline/sync/test_pipe.py::test_no_chunk[never], test/distributed/pipeline/sync/test_pipe.py::test_no_chunk[always], test/distributed/pipeline/sync/test_pipe.py::test_no_chunk[except_last], test/distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[never], test/distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[always], test/distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[except_last], test/distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[never], test/distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[always], test/distributed/pipeline/sync/test_pipe.py::test_devices, test/distributed/pipeline/sync/test_pipe.py::test_partitions, test/distributed/pipeline/sync/test_pipe.py::test_merged_partitions, test/distributed/pipeline/sync/test_pipe.py::test_deny_moving, test/distributed/pipeline/sync/test_pipe.py::test_empty_module, test/distributed/pipeline/sync/test_pipe.py::test_named_children, test/distributed/pipeline/sync/test_pipe.py::test_verify_module_non_sequential, test/distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_children, test/distributed/pipeline/sync/test_pipe.py::test_verify_module_params_on_same_device, test/distributed/pipeline/sync/test_pipe.py::test_verify_nested_modules, test/distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_parameters_on_same_device, test/distributed/pipeline/sync/test_pipe.py::test_forward_lockstep, test/distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[never], test/distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[always], test/distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[except_last], test/distributed/pipeline/sync/test_pipe.py::test_inputs_wrong_device, test/distributed/pipeline/sync/test_pipe.py::test_with_device_wrapper 2022-12-01T11:07:36.3099639Z 2022-12-01T11:07:36.3099876Z distributed/pipeline/sync/test_pipe.py::test_pipe_without_rpc PASSED [ 1%] 2022-12-01T11:07:36.3100330Z distributed/pipeline/sync/test_pipe.py::test_parameters PASSED [ 3%] 2022-12-01T11:07:36.3100796Z distributed/pipeline/sync/test_pipe.py::test_public_attrs PASSED [ 5%] 2022-12-01T11:07:36.3101265Z distributed/pipeline/sync/test_pipe.py::test_sequential_like PASSED [ 7%] 2022-12-01T11:07:36.3101720Z distributed/pipeline/sync/test_pipe.py::test_chunks_less_than_1 PASSED [ 8%] 2022-12-01T11:07:36.3102201Z distributed/pipeline/sync/test_pipe.py::test_batch_size_indivisible PASSED [ 10%] 2022-12-01T11:07:36.3102670Z distributed/pipeline/sync/test_pipe.py::test_batch_size_small PASSED [ 12%] 2022-12-01T11:07:36.3103135Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode PASSED [ 14%] 2022-12-01T11:07:36.3103593Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_invalid PASSED [ 16%] 2022-12-01T11:07:36.3104086Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_when_chunks_1 PASSED [ 17%] 2022-12-01T11:07:36.3104570Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_eval PASSED [ 19%] 2022-12-01T11:07:36.3105117Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_non_float_input PASSED [ 21%] 2022-12-01T11:07:36.3105588Z distributed/pipeline/sync/test_pipe.py::test_no_grad PASSED [ 23%] 2022-12-01T11:07:36.3106039Z distributed/pipeline/sync/test_pipe.py::test_exception PASSED [ 25%] 2022-12-01T11:07:36.3106517Z distributed/pipeline/sync/test_pipe.py::test_exception_early_stop_asap PASSED [ 26%] 2022-12-01T11:07:36.3106978Z distributed/pipeline/sync/test_pipe.py::test_nested_input PASSED [ 28%] 2022-12-01T11:07:36.3107470Z distributed/pipeline/sync/test_pipe.py::test_input_pair PASSED [ 30%] 2022-12-01T11:07:36.3107934Z distributed/pipeline/sync/test_pipe.py::test_multi_sequence_input PASSED [ 32%] 2022-12-01T11:07:36.3108399Z distributed/pipeline/sync/test_pipe.py::test_input_singleton PASSED [ 33%] 2022-12-01T11:07:36.3108846Z distributed/pipeline/sync/test_pipe.py::test_input_varargs PASSED [ 35%] 2022-12-01T11:07:36.3109304Z distributed/pipeline/sync/test_pipe.py::test_non_tensor PASSED [ 37%] 2022-12-01T11:07:36.3109764Z distributed/pipeline/sync/test_pipe.py::test_non_tensor_sequence PASSED [ 39%] 2022-12-01T11:07:36.3110226Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[never] PASSED [ 41%] 2022-12-01T11:07:36.3110715Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[always] PASSED [ 42%] 2022-12-01T11:07:36.3111207Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[except_last] PASSED [ 44%] 2022-12-01T11:07:36.3111697Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[never] PASSED [ 46%] 2022-12-01T11:07:36.3112162Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[always] PASSED [ 48%] 2022-12-01T11:07:36.3112653Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[except_last] PASSED [ 50%] 2022-12-01T11:07:36.3113141Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[never] PASSED [ 51%] 2022-12-01T11:07:36.3113614Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[always] PASSED [ 53%] 2022-12-01T11:07:36.3114111Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[except_last] PASSED [ 55%] 2022-12-01T11:07:36.3114653Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[never] PASSED [ 57%] 2022-12-01T11:07:36.3115136Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[always] PASSED [ 58%] 2022-12-01T11:07:36.3115587Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[except_last] PASSED [ 60%] 2022-12-01T11:07:36.3116071Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[never] PASSED [ 62%] 2022-12-01T11:07:36.3116563Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[always] PASSED [ 64%] 2022-12-01T11:07:36.3117048Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[except_last] PASSED [ 66%] 2022-12-01T11:07:36.3117554Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[never] PASSED [ 67%] 2022-12-01T11:07:36.3118073Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[always] PASSED [ 69%] 2022-12-01T11:07:36.3118558Z distributed/pipeline/sync/test_pipe.py::test_devices PASSED [ 71%] 2022-12-01T11:07:36.3118996Z distributed/pipeline/sync/test_pipe.py::test_partitions PASSED [ 73%] 2022-12-01T11:07:36.3119466Z distributed/pipeline/sync/test_pipe.py::test_merged_partitions PASSED [ 75%] 2022-12-01T11:07:36.3119921Z distributed/pipeline/sync/test_pipe.py::test_deny_moving PASSED [ 76%] 2022-12-01T11:07:36.3120375Z distributed/pipeline/sync/test_pipe.py::test_empty_module PASSED [ 78%] 2022-12-01T11:07:36.3120822Z distributed/pipeline/sync/test_pipe.py::test_named_children PASSED [ 80%] 2022-12-01T11:07:36.3121301Z distributed/pipeline/sync/test_pipe.py::test_verify_module_non_sequential PASSED [ 82%] 2022-12-01T11:07:36.3121809Z distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_children PASSED [ 83%] 2022-12-01T11:07:36.3122563Z distributed/pipeline/sync/test_pipe.py::test_verify_module_params_on_same_device PASSED [ 85%] 2022-12-01T11:07:36.3123063Z distributed/pipeline/sync/test_pipe.py::test_verify_nested_modules PASSED [ 87%] 2022-12-01T11:07:36.3123562Z distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_parameters_on_same_device PASSED [ 89%] 2022-12-01T11:07:36.3124051Z distributed/pipeline/sync/test_pipe.py::test_forward_lockstep PASSED [ 91%] 2022-12-01T11:07:36.3124485Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[never] PASSED [ 92%] 2022-12-01T11:07:36.3124938Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[always] PASSED [ 94%] 2022-12-01T11:07:36.3125399Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[except_last] PASSED [ 96%] 2022-12-01T11:07:36.3125836Z distributed/pipeline/sync/test_pipe.py::test_inputs_wrong_device PASSED [ 98%] 2022-12-01T11:07:36.3126276Z distributed/pipeline/sync/test_pipe.py::test_with_device_wrapper PASSED [100%] 2022-12-01T11:07:36.3126528Z 2022-12-01T11:07:36.3126685Z ============================== 56 passed in 4.86s ============================== 2022-12-01T11:07:36.3126878Z 2022-12-01T11:07:36.3127177Z ##[endgroup] 2022-12-01T11:07:36.3127829Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_pipe (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_pipe_j3vhijrr) 2022-12-01T11:07:36.3128210Z 2022-12-01T11:07:36.3128506Z Running distributed/pipeline/sync/test_transparency ... [2022-12-01 11:07:36.308000] 2022-12-01T11:07:36.3129142Z Executing ['/opt/conda/bin/python', '-bb', '-m', 'pytest', 'distributed/pipeline/sync/test_transparency.py', '-v'] ... [2022-12-01 11:07:36.308403] 2022-12-01T11:07:38.3700133Z 2022-12-01T11:07:38.3700851Z Expand the folded group to see the log file of distributed/pipeline/sync/test_transparency 2022-12-01T11:07:38.3702037Z ##[group]PRINTING LOG FILE of distributed/pipeline/sync/test_transparency (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_transparency_ww88gkkd) 2022-12-01T11:07:38.3702788Z ============================= test session starts ============================== 2022-12-01T11:07:38.3703795Z platform linux -- Python 3.10.4, pytest-7.1.3, pluggy-1.0.0 -- /opt/conda/bin/python 2022-12-01T11:07:38.3704191Z cachedir: .pytest_cache 2022-12-01T11:07:38.3704810Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-12-01T11:07:38.3705282Z torch: 1.13.0a0+gitc13d400 2022-12-01T11:07:38.3705796Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-12-01T11:07:38.3706359Z plugins: hypothesis-5.35.1, forked-1.4.0, rerunfailures-10.2, shard-0.1.2, xdist-2.5.0, xdoctest-1.0.2 2022-12-01T11:07:38.3706755Z collecting ... collected 1 item 2022-12-01T11:07:38.3707174Z Running 1 items in this shard: test/distributed/pipeline/sync/test_transparency.py::test_simple_linears 2022-12-01T11:07:38.3707455Z 2022-12-01T11:07:38.3707663Z distributed/pipeline/sync/test_transparency.py::test_simple_linears PASSED [100%] 2022-12-01T11:07:38.3707930Z 2022-12-01T11:07:38.3708090Z ============================== 1 passed in 0.13s =============================== 2022-12-01T11:07:38.3708293Z 2022-12-01T11:07:38.3708616Z ##[endgroup] 2022-12-01T11:07:38.3709273Z FINISHED PRINTING LOG FILE of distributed/pipeline/sync/test_transparency (/var/lib/jenkins/workspace/test/test-reports/distributed-pipeline-sync-test_transparency_ww88gkkd) 2022-12-01T11:07:38.3709683Z 2022-12-01T11:07:38.3709975Z Running distributed/rpc/test_tensorpipe_agent ... [2022-12-01 11:07:38.370060] 2022-12-01T11:07:38.3710712Z Executing ['/opt/conda/bin/python', '-bb', 'distributed/rpc/test_tensorpipe_agent.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-12-01 11:07:38.370372] 2022-12-01T11:07:40.4589008Z 2022-12-01T11:07:40.4589783Z Expand the folded group to see the log file of distributed/rpc/test_tensorpipe_agent 2022-12-01T11:07:40.4591142Z ##[group]PRINTING LOG FILE of distributed/rpc/test_tensorpipe_agent (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_tensorpipe_agent_g4z4z9wb) 2022-12-01T11:07:40.4591790Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpabq1lajb 2022-12-01T11:07:40.4592342Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpabq1lajb/_remote_module_non_scriptable.py 2022-12-01T11:07:40.4592664Z 2022-12-01T11:07:40.4592969Z ##[endgroup] 2022-12-01T11:07:40.4593684Z FINISHED PRINTING LOG FILE of distributed/rpc/test_tensorpipe_agent (/var/lib/jenkins/workspace/test/test-reports/distributed-rpc-test_tensorpipe_agent_g4z4z9wb) 2022-12-01T11:07:40.4594056Z 2022-12-01T11:07:40.7031818Z 2022-12-01T11:07:40.7032538Z real 47m2.441s 2022-12-01T11:07:40.7032966Z user 85m56.256s 2022-12-01T11:07:40.7033200Z sys 45m27.707s 2022-12-01T11:07:40.7033438Z + assert_git_not_dirty 2022-12-01T11:07:40.7033969Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *rocm* ]] 2022-12-01T11:07:40.7034413Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 != *xla* ]] 2022-12-01T11:07:40.7035692Z ++ git status --porcelain 2022-12-01T11:07:41.8745888Z + git_status= 2022-12-01T11:07:41.8746518Z + [[ -n '' ]] 2022-12-01T11:07:41.8747001Z + [[ linux-bionic-cuda11.6-py3.10-gcc7 == *cuda* ]] 2022-12-01T11:07:41.8747318Z + [[ 3 == 1 ]] 2022-12-01T11:07:41.8747543Z + [[ 3 == 1 ]] 2022-12-01T11:07:41.8818450Z Prepare all required actions 2022-12-01T11:07:41.8818885Z Getting action download info 2022-12-01T11:07:42.0871194Z ##[group]Run ./.github/actions/get-workflow-job-id 2022-12-01T11:07:42.0871491Z with: 2022-12-01T11:07:42.0871962Z github-token: *** 2022-12-01T11:07:42.0872184Z env: 2022-12-01T11:07:42.0872422Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:42.0872689Z GPU_FLAG: --gpus all 2022-12-01T11:07:42.0872918Z ##[endgroup] 2022-12-01T11:07:42.0908480Z ##[group]Run nick-fields/retry@7d4a37704547a311dbb66ebdf5b23ec19374a767 2022-12-01T11:07:42.0908802Z with: 2022-12-01T11:07:42.0909024Z shell: bash 2022-12-01T11:07:42.0909261Z timeout_minutes: 10 2022-12-01T11:07:42.0909509Z max_attempts: 5 2022-12-01T11:07:42.0909762Z retry_wait_seconds: 30 2022-12-01T11:07:42.0910296Z command: set -eux python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}" 2022-12-01T11:07:42.0910803Z polling_interval_seconds: 1 2022-12-01T11:07:42.0911053Z warning_on_retry: true 2022-12-01T11:07:42.0911311Z continue_on_error: false 2022-12-01T11:07:42.0911551Z env: 2022-12-01T11:07:42.0911767Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:42.0912027Z GPU_FLAG: --gpus all 2022-12-01T11:07:42.0912424Z GITHUB_TOKEN: *** 2022-12-01T11:07:42.0912668Z ##[endgroup] 2022-12-01T11:07:42.1467735Z 2022-12-01T11:07:42.1476459Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T11:07:42.1533087Z + python3 -m pip install requests==2.26.0 2022-12-01T11:07:42.4532432Z Defaulting to user installation because normal site-packages is not writeable 2022-12-01T11:07:42.4770044Z Requirement already satisfied: requests==2.26.0 in /home/ec2-user/.local/lib/python3.7/site-packages (2.26.0) 2022-12-01T11:07:42.4960214Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2.0.12) 2022-12-01T11:07:42.4987680Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2022.9.24) 2022-12-01T11:07:42.5000013Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (1.26.13) 2022-12-01T11:07:42.5227807Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (3.4) 2022-12-01T11:07:42.7774837Z ++ python3 .github/scripts/get_workflow_job_id.py 3591403534 i-0eaaa5984457e9076 2022-12-01T11:07:44.7433362Z + GHA_WORKFLOW_JOB_ID=9818608637 2022-12-01T11:07:44.7434484Z + echo '::set-output name=job-id::9818608637' 2022-12-01T11:07:44.7436430Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T11:07:45.1549254Z Command completed after 1 attempt(s). 2022-12-01T11:07:45.1549662Z 2022-12-01T11:07:45.1552005Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ 2022-12-01T11:07:45.1686058Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2022-12-01T11:07:45.1686420Z kill "$MONITOR_SCRIPT_PID" 2022-12-01T11:07:45.1699556Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:07:45.1699854Z env: 2022-12-01T11:07:45.1700096Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:45.1700343Z GPU_FLAG: --gpus all 2022-12-01T11:07:45.1700606Z MONITOR_SCRIPT_PID: 125159 2022-12-01T11:07:45.1700860Z ##[endgroup] 2022-12-01T11:07:45.1812201Z Prepare all required actions 2022-12-01T11:07:45.1812575Z Getting action download info 2022-12-01T11:07:45.3568124Z Download action repository 'actions/upload-artifact@v2' (SHA:82c141cc518b40d92cc801eee768e7aafc9c2fa2) 2022-12-01T11:07:45.5168436Z ##[group]Run ./.github/actions/upload-test-artifacts 2022-12-01T11:07:45.5168733Z with: 2022-12-01T11:07:45.5169084Z file-suffix: test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637 2022-12-01T11:07:45.5169426Z env: 2022-12-01T11:07:45.5169641Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:45.5169919Z GPU_FLAG: --gpus all 2022-12-01T11:07:45.5170166Z ##[endgroup] 2022-12-01T11:07:45.5201913Z ##[group]Run # Remove any previous test jsons if they exist 2022-12-01T11:07:45.5202291Z # Remove any previous test jsons if they exist 2022-12-01T11:07:45.5203161Z rm -f test-jsons-*.zip 2022-12-01T11:07:45.5203500Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' 2022-12-01T11:07:45.5215643Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:07:45.5215945Z env: 2022-12-01T11:07:45.5216171Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:45.5216441Z GPU_FLAG: --gpus all 2022-12-01T11:07:45.5216812Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637 2022-12-01T11:07:45.5217150Z ##[endgroup] 2022-12-01T11:07:45.5321836Z adding: test/allowlist_for_publicAPI.json (deflated 80%) 2022-12-01T11:07:45.5357539Z adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%) 2022-12-01T11:07:45.5365544Z adding: test/profiler/profiler_utils_mock_events.json (deflated 87%) 2022-12-01T11:07:45.5366805Z adding: test/.pytorch-slow-tests.json (deflated 76%) 2022-12-01T11:07:45.5377761Z adding: test/.pytorch-disabled-tests.json (deflated 86%) 2022-12-01T11:07:45.5404746Z ##[group]Run # Remove any previous test reports if they exist 2022-12-01T11:07:45.5405137Z # Remove any previous test reports if they exist 2022-12-01T11:07:45.5405469Z rm -f test-reports-*.zip 2022-12-01T11:07:45.5405814Z zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' 2022-12-01T11:07:45.5417676Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:07:45.5417980Z env: 2022-12-01T11:07:45.5418226Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:45.5418480Z GPU_FLAG: --gpus all 2022-12-01T11:07:45.5418854Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637 2022-12-01T11:07:45.5419216Z ##[endgroup] 2022-12-01T11:07:45.5520488Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102046.xml (deflated 39%) 2022-12-01T11:07:45.5521389Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102054.xml (deflated 39%) 2022-12-01T11:07:45.5522072Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102103.xml (deflated 39%) 2022-12-01T11:07:45.5523220Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102109.xml (deflated 38%) 2022-12-01T11:07:45.5523891Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102117.xml (deflated 38%) 2022-12-01T11:07:45.5524553Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102123.xml (deflated 39%) 2022-12-01T11:07:45.5525212Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102129.xml (deflated 39%) 2022-12-01T11:07:45.5525877Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102134.xml (deflated 39%) 2022-12-01T11:07:45.5526527Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102140.xml (deflated 38%) 2022-12-01T11:07:45.5527184Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102148.xml (deflated 38%) 2022-12-01T11:07:45.5527953Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102156.xml (deflated 38%) 2022-12-01T11:07:45.5528657Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102203.xml (deflated 38%) 2022-12-01T11:07:45.5529285Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102209.xml (deflated 39%) 2022-12-01T11:07:45.5529946Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102218.xml (deflated 39%) 2022-12-01T11:07:45.5530606Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102226.xml (deflated 39%) 2022-12-01T11:07:45.5531265Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102232.xml (deflated 39%) 2022-12-01T11:07:45.5531953Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102239.xml (deflated 38%) 2022-12-01T11:07:45.5532630Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102247.xml (deflated 38%) 2022-12-01T11:07:45.5533282Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20221201102256.xml (deflated 38%) 2022-12-01T11:07:45.5533960Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102303.xml (deflated 38%) 2022-12-01T11:07:45.5534636Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102312.xml (deflated 38%) 2022-12-01T11:07:45.5535326Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102321.xml (deflated 38%) 2022-12-01T11:07:45.5536017Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102329.xml (deflated 38%) 2022-12-01T11:07:45.5536694Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102338.xml (deflated 38%) 2022-12-01T11:07:45.5537365Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102347.xml (deflated 38%) 2022-12-01T11:07:45.5538064Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CompilerTest-20221201102355.xml (deflated 38%) 2022-12-01T11:07:45.5538819Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102404.xml (deflated 42%) 2022-12-01T11:07:45.5539632Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102413.xml (deflated 41%) 2022-12-01T11:07:45.5540537Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102422.xml (deflated 41%) 2022-12-01T11:07:45.5541331Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102432.xml (deflated 41%) 2022-12-01T11:07:45.5542142Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102441.xml (deflated 41%) 2022-12-01T11:07:45.5542935Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102449.xml (deflated 42%) 2022-12-01T11:07:45.5543713Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102458.xml (deflated 41%) 2022-12-01T11:07:45.5544516Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102507.xml (deflated 41%) 2022-12-01T11:07:45.5545315Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102515.xml (deflated 42%) 2022-12-01T11:07:45.5546118Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102524.xml (deflated 45%) 2022-12-01T11:07:45.5546985Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102532.xml (deflated 45%) 2022-12-01T11:07:45.5547808Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102540.xml (deflated 43%) 2022-12-01T11:07:45.5548599Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102548.xml (deflated 43%) 2022-12-01T11:07:45.5549395Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102557.xml (deflated 45%) 2022-12-01T11:07:45.5550195Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102605.xml (deflated 46%) 2022-12-01T11:07:45.5550969Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102613.xml (deflated 47%) 2022-12-01T11:07:45.5551759Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102621.xml (deflated 47%) 2022-12-01T11:07:45.5552554Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102629.xml (deflated 44%) 2022-12-01T11:07:45.5553336Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102637.xml (deflated 45%) 2022-12-01T11:07:45.5554106Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102646.xml (deflated 45%) 2022-12-01T11:07:45.5554885Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102654.xml (deflated 44%) 2022-12-01T11:07:45.5555682Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102702.xml (deflated 44%) 2022-12-01T11:07:45.5556474Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102710.xml (deflated 42%) 2022-12-01T11:07:45.5557249Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102719.xml (deflated 42%) 2022-12-01T11:07:45.5558036Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102728.xml (deflated 42%) 2022-12-01T11:07:45.5558816Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102736.xml (deflated 45%) 2022-12-01T11:07:45.5559620Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102745.xml (deflated 44%) 2022-12-01T11:07:45.5560482Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102754.xml (deflated 42%) 2022-12-01T11:07:45.5561276Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102800.xml (deflated 41%) 2022-12-01T11:07:45.5562069Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102809.xml (deflated 42%) 2022-12-01T11:07:45.5563286Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102815.xml (deflated 42%) 2022-12-01T11:07:45.5564062Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102823.xml (deflated 41%) 2022-12-01T11:07:45.5564853Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102832.xml (deflated 42%) 2022-12-01T11:07:45.5565642Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102841.xml (deflated 42%) 2022-12-01T11:07:45.5566434Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102845.xml (deflated 42%) 2022-12-01T11:07:45.5567325Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102849.xml (deflated 42%) 2022-12-01T11:07:45.5568123Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102853.xml (deflated 42%) 2022-12-01T11:07:45.5568900Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102856.xml (deflated 42%) 2022-12-01T11:07:45.5569687Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102900.xml (deflated 42%) 2022-12-01T11:07:45.5570478Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102904.xml (deflated 42%) 2022-12-01T11:07:45.5571254Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102913.xml (deflated 41%) 2022-12-01T11:07:45.5572041Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102922.xml (deflated 42%) 2022-12-01T11:07:45.5572832Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102931.xml (deflated 41%) 2022-12-01T11:07:45.5573612Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102940.xml (deflated 41%) 2022-12-01T11:07:45.5574384Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102951.xml (deflated 42%) 2022-12-01T11:07:45.5575173Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201102957.xml (deflated 41%) 2022-12-01T11:07:45.5575965Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103003.xml (deflated 42%) 2022-12-01T11:07:45.5576754Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103012.xml (deflated 42%) 2022-12-01T11:07:45.5577528Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103021.xml (deflated 42%) 2022-12-01T11:07:45.5578313Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103030.xml (deflated 42%) 2022-12-01T11:07:45.5579097Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103039.xml (deflated 42%) 2022-12-01T11:07:45.5580039Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103045.xml (deflated 42%) 2022-12-01T11:07:45.5580826Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103051.xml (deflated 41%) 2022-12-01T11:07:45.5581601Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103059.xml (deflated 43%) 2022-12-01T11:07:45.5582385Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103105.xml (deflated 43%) 2022-12-01T11:07:45.5583172Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103114.xml (deflated 42%) 2022-12-01T11:07:45.5583958Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103123.xml (deflated 43%) 2022-12-01T11:07:45.5584732Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103148.xml (deflated 44%) 2022-12-01T11:07:45.5585512Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103157.xml (deflated 42%) 2022-12-01T11:07:45.5586365Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103206.xml (deflated 42%) 2022-12-01T11:07:45.5587179Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103212.xml (deflated 41%) 2022-12-01T11:07:45.5587950Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103220.xml (deflated 41%) 2022-12-01T11:07:45.5588727Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103229.xml (deflated 41%) 2022-12-01T11:07:45.5589511Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20221201103239.xml (deflated 41%) 2022-12-01T11:07:45.5590291Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103249.xml (deflated 41%) 2022-12-01T11:07:45.5591031Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103255.xml (deflated 41%) 2022-12-01T11:07:45.5591795Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103301.xml (deflated 42%) 2022-12-01T11:07:45.5592553Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103303.xml (deflated 42%) 2022-12-01T11:07:45.5593308Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103309.xml (deflated 41%) 2022-12-01T11:07:45.5594036Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103315.xml (deflated 41%) 2022-12-01T11:07:45.5594800Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103321.xml (deflated 41%) 2022-12-01T11:07:45.5595549Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103327.xml (deflated 42%) 2022-12-01T11:07:45.5596299Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20221201103330.xml (deflated 41%) 2022-12-01T11:07:45.5597155Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclProcessGroupWithDispatchedCollectivesTests-20221201103336.xml (deflated 44%) 2022-12-01T11:07:45.5598037Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLNoGPUTest-20221201103343.xml (deflated 41%) 2022-12-01T11:07:45.5598811Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103345.xml (deflated 39%) 2022-12-01T11:07:45.5599649Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103353.xml (deflated 38%) 2022-12-01T11:07:45.5600379Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103401.xml (deflated 38%) 2022-12-01T11:07:45.5601136Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103410.xml (deflated 39%) 2022-12-01T11:07:45.5601892Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103419.xml (deflated 39%) 2022-12-01T11:07:45.5603002Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103427.xml (deflated 39%) 2022-12-01T11:07:45.5603748Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103436.xml (deflated 39%) 2022-12-01T11:07:45.5604496Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103445.xml (deflated 38%) 2022-12-01T11:07:45.5605239Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103452.xml (deflated 38%) 2022-12-01T11:07:45.5606093Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103501.xml (deflated 39%) 2022-12-01T11:07:45.5606849Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103514.xml (deflated 39%) 2022-12-01T11:07:45.5607597Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103522.xml (deflated 39%) 2022-12-01T11:07:45.5608342Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103530.xml (deflated 38%) 2022-12-01T11:07:45.5609084Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103539.xml (deflated 39%) 2022-12-01T11:07:45.5609811Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103547.xml (deflated 39%) 2022-12-01T11:07:45.5610547Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103555.xml (deflated 39%) 2022-12-01T11:07:45.5611292Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103604.xml (deflated 39%) 2022-12-01T11:07:45.5612065Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20221201103616.xml (deflated 39%) 2022-12-01T11:07:45.5612800Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-RendezvousEnvTest-20221201103624.xml (deflated 40%) 2022-12-01T11:07:45.5613477Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-TimeoutTest-20221201103628.xml (deflated 40%) 2022-12-01T11:07:45.5614184Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20221201103634.xml (deflated 80%) 2022-12-01T11:07:45.5614895Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20221201103634.xml (deflated 64%) 2022-12-01T11:07:45.5615617Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20221201103634.xml (deflated 61%) 2022-12-01T11:07:45.5616348Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20221201103634.xml (deflated 91%) 2022-12-01T11:07:45.5617122Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_state_dict/TEST-TestFSDPStateDict-20221201104235.xml (deflated 94%) 2022-12-01T11:07:45.5617900Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_optim_state/TEST-TestFSDPOptimState-20221201104652.xml (deflated 93%) 2022-12-01T11:07:45.5618688Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpoint-20221201105025.xml (deflated 77%) 2022-12-01T11:07:45.5619533Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20221201105056.xml (deflated 77%) 2022-12-01T11:07:45.5620265Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc/TEST-TestGradAcc-20221201105158.xml (deflated 93%) 2022-12-01T11:07:45.5621054Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_freezing_weights/TEST-TestFreezingWeights-20221201105249.xml (deflated 84%) 2022-12-01T11:07:45.5621850Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_exec_order/TEST-TestFSDPExecOrder-20221201105332.xml (deflated 83%) 2022-12-01T11:07:45.5622733Z adding: test/test-reports/python-unittest/distributed.algorithms.ddp_comm_hooks.test_ddp_hooks/TEST-DistributedDataParallelCommHookTest-20221201105407.xml (deflated 79%) 2022-12-01T11:07:45.5623611Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_clip_grad_norm/TEST-TestCalcuGradNorm-20221201105440.xml (deflated 84%) 2022-12-01T11:07:45.5624406Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_clip_grad_norm/TEST-TestClipGradNorm-20221201105440.xml (deflated 86%) 2022-12-01T11:07:45.5625297Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_ignored_modules/TEST-TestFSDPIgnoredModules-20221201105546.xml (deflated 75%) 2022-12-01T11:07:45.5626066Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_apply/TEST-TestApply-20221201105609.xml (deflated 61%) 2022-12-01T11:07:45.5626868Z adding: test/test-reports/python-unittest/distributed.fsdp.test_distributed_checkpoint/TEST-TestDistributedCheckpoint-20221201105624.xml (deflated 59%) 2022-12-01T11:07:45.5627699Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_forward/TEST-TestMultiForward-20221201105634.xml (deflated 41%) 2022-12-01T11:07:45.5628481Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven/TEST-TestUnevenParamShard-20221201105641.xml (deflated 41%) 2022-12-01T11:07:45.5629302Z adding: test/test-reports/python-unittest/distributed.fsdp.test_checkpoint_wrapper/TEST-CheckpointWrapperTest-20221201105649.xml (deflated 68%) 2022-12-01T11:07:45.5630083Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_fx/TEST-TestSymbolicTracing-20221201105653.xml (deflated 45%) 2022-12-01T11:07:45.5630917Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20221201105700.xml (deflated 56%) 2022-12-01T11:07:45.5631828Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedFailure-20221201105700.xml (deflated 78%) 2022-12-01T11:07:45.5632630Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_planner/TEST-TestSavePlan-20221201105721.xml (deflated 71%) 2022-12-01T11:07:45.5633455Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestCreateTensorFromParams-20221201105725.xml (deflated 43%) 2022-12-01T11:07:45.5634338Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorMetadata-20221201105725.xml (deflated 44%) 2022-12-01T11:07:45.5635187Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestLocalTensor-20221201105725.xml (deflated 59%) 2022-12-01T11:07:45.5636000Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestModuleHookApi-20221201105725.xml (deflated 58%) 2022-12-01T11:07:45.5636798Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardMetadata-20221201105725.xml (deflated 58%) 2022-12-01T11:07:45.5637613Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardParameter-20221201105725.xml (deflated 60%) 2022-12-01T11:07:45.5638515Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardTensor-20221201105725.xml (deflated 60%) 2022-12-01T11:07:45.5639389Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorChunked-20221201105725.xml (deflated 89%) 2022-12-01T11:07:45.5640274Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorCustomOps-20221201105725.xml (deflated 69%) 2022-12-01T11:07:45.5641143Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorEnumerable-20221201105725.xml (deflated 87%) 2022-12-01T11:07:45.5642061Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalShards-20221201105725.xml (deflated 82%) 2022-12-01T11:07:45.5643401Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalTensor-20221201105725.xml (deflated 61%) 2022-12-01T11:07:45.5644245Z adding: test/test-reports/python-unittest/distributed.test_c10d_pypg/TEST-TestDDPWithWorkSubclass-20221201105926.xml (deflated 84%) 2022-12-01T11:07:45.5645120Z adding: test/test-reports/python-unittest/distributed.test_c10d_pypg/TEST-TestDDPWithWorkWrapper-20221201105926.xml (deflated 84%) 2022-12-01T11:07:45.5646185Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110149.xml (deflated 40%) 2022-12-01T11:07:45.5647047Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110155.xml (deflated 41%) 2022-12-01T11:07:45.5647867Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110202.xml (deflated 41%) 2022-12-01T11:07:45.5648693Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110208.xml (deflated 40%) 2022-12-01T11:07:45.5649480Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110214.xml (deflated 40%) 2022-12-01T11:07:45.5650293Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110220.xml (deflated 41%) 2022-12-01T11:07:45.5651103Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110227.xml (deflated 41%) 2022-12-01T11:07:45.5651908Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110233.xml (deflated 41%) 2022-12-01T11:07:45.5652688Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20221201110239.xml (deflated 40%) 2022-12-01T11:07:45.5653510Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110245.xml (deflated 39%) 2022-12-01T11:07:45.5654313Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110251.xml (deflated 40%) 2022-12-01T11:07:45.5655122Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110259.xml (deflated 39%) 2022-12-01T11:07:45.5655909Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110307.xml (deflated 40%) 2022-12-01T11:07:45.5656727Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20221201110316.xml (deflated 40%) 2022-12-01T11:07:45.5657606Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20221201110329.xml (deflated 44%) 2022-12-01T11:07:45.5658665Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20221201110332.xml (deflated 44%) 2022-12-01T11:07:45.5659567Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20221201110336.xml (deflated 43%) 2022-12-01T11:07:45.5660455Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110341.xml (deflated 42%) 2022-12-01T11:07:45.5661302Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110349.xml (deflated 42%) 2022-12-01T11:07:45.5662151Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110357.xml (deflated 42%) 2022-12-01T11:07:45.5662974Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110405.xml (deflated 42%) 2022-12-01T11:07:45.5663819Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110413.xml (deflated 42%) 2022-12-01T11:07:45.5664662Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110421.xml (deflated 42%) 2022-12-01T11:07:45.5665578Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110429.xml (deflated 42%) 2022-12-01T11:07:45.5666436Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20221201110437.xml (deflated 42%) 2022-12-01T11:07:45.5667293Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20221201110444.xml (deflated 86%) 2022-12-01T11:07:45.5668138Z adding: test/test-reports/python-unittest/distributed.test_c10d_object_collectives/TEST-TestObjectCollectives-20221201110509.xml (deflated 69%) 2022-12-01T11:07:45.5668948Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops/TEST-TestTensorOps-20221201110526.xml (deflated 75%) 2022-12-01T11:07:45.5669744Z adding: test/test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorOps-20221201110539.xml (deflated 67%) 2022-12-01T11:07:45.5670534Z adding: test/test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorReshard-20221201110539.xml (deflated 60%) 2022-12-01T11:07:45.5671358Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_example/TEST-LocalTimerExample-20221201110552.xml (deflated 54%) 2022-12-01T11:07:45.5672219Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear/TEST-TestShardedTensorOpsLinear-20221201110603.xml (deflated 68%) 2022-12-01T11:07:45.5673065Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20221201110613.xml (deflated 59%) 2022-12-01T11:07:45.5673894Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding_bag/TEST-TestShardedEmbeddingBag-20221201110620.xml (deflated 60%) 2022-12-01T11:07:45.5674738Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/TEST-TestReshard-20221201110627.xml (deflated 61%) 2022-12-01T11:07:45.5675555Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20221201110634.xml (deflated 66%) 2022-12-01T11:07:45.5676364Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20221201110634.xml (deflated 69%) 2022-12-01T11:07:45.5677227Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20221201110634.xml (deflated 66%) 2022-12-01T11:07:45.5678174Z adding: test/test-reports/python-unittest/distributed.elastic.utils.distributed_test/TEST-DistributedUtilTest-20221201110641.xml (deflated 71%) 2022-12-01T11:07:45.5678982Z adding: test/test-reports/python-unittest/distributed.rpc.test_share_memory/TEST-TestRPCPickler-20221201110649.xml (deflated 38%) 2022-12-01T11:07:45.5679736Z adding: test/test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-StoreUtilTest-20221201110655.xml (deflated 62%) 2022-12-01T11:07:45.5680482Z adding: test/test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-UtilTest-20221201110655.xml (deflated 69%) 2022-12-01T11:07:45.5681211Z adding: test/test-reports/python-unittest/distributed.nn.jit.test_instantiator/TEST-TestInstantiator-20221201110659.xml (deflated 63%) 2022-12-01T11:07:45.5681965Z adding: test/test-reports/python-unittest/distributed.test_launcher/TEST-TestDistributedLaunch-20221201110702.xml (deflated 43%) 2022-12-01T11:07:45.5704902Z ##[group]Run # Remove any previous test reports if they exist 2022-12-01T11:07:45.5705307Z # Remove any previous test reports if they exist 2022-12-01T11:07:45.5705629Z rm -f usage-log-*.zip 2022-12-01T11:07:45.5706005Z # this workflow is also run in bazel build test, but we dont generate usage reports for it 2022-12-01T11:07:45.5706492Z # so check to see if the file exists first 2022-12-01T11:07:45.5706808Z if [ -f 'usage_log.txt' ]; then 2022-12-01T11:07:45.5707143Z  zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt' 2022-12-01T11:07:45.5707437Z fi 2022-12-01T11:07:45.5719087Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:07:45.5719386Z env: 2022-12-01T11:07:45.5719630Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:45.5719886Z GPU_FLAG: --gpus all 2022-12-01T11:07:45.5720270Z FILE_SUFFIX: test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637 2022-12-01T11:07:45.5720627Z ##[endgroup] 2022-12-01T11:07:45.6178329Z adding: usage_log.txt (deflated 95%) 2022-12-01T11:07:45.6225831Z ##[group]Run seemethere/upload-artifact-s3@v5 2022-12-01T11:07:45.6226123Z with: 2022-12-01T11:07:45.6226382Z s3-prefix: pytorch/pytorch/3591403534/1/artifact 2022-12-01T11:07:45.6226677Z retention-days: 14 2022-12-01T11:07:45.6226942Z if-no-files-found: warn 2022-12-01T11:07:45.6227195Z path: test-jsons-*.zip 2022-12-01T11:07:45.6227451Z name: artifact 2022-12-01T11:07:45.6227701Z s3-bucket: gha-artifacts 2022-12-01T11:07:45.6227943Z region: us-east-1 2022-12-01T11:07:45.6228170Z env: 2022-12-01T11:07:45.6228405Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:45.6228648Z GPU_FLAG: --gpus all 2022-12-01T11:07:45.6228890Z ##[endgroup] 2022-12-01T11:07:46.0717958Z NOTE: s3-prefix specified, ignoring name parameter 2022-12-01T11:07:46.0718747Z With the provided path, there will be 1 file uploaded 2022-12-01T11:07:46.0719116Z Uploading to s3 prefix: pytorch/pytorch/3591403534/1/artifact 2022-12-01T11:07:46.0730698Z Starting upload of test-jsons-test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637.zip 2022-12-01T11:07:46.2024957Z Finished upload of test-jsons-test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637.zip 2022-12-01T11:07:46.2217739Z ##[group]Run seemethere/upload-artifact-s3@v5 2022-12-01T11:07:46.2218032Z with: 2022-12-01T11:07:46.2218295Z s3-prefix: pytorch/pytorch/3591403534/1/artifact 2022-12-01T11:07:46.2218604Z retention-days: 14 2022-12-01T11:07:46.2218875Z if-no-files-found: error 2022-12-01T11:07:46.2219141Z path: test-reports-*.zip 2022-12-01T11:07:46.2219392Z name: artifact 2022-12-01T11:07:46.2219639Z s3-bucket: gha-artifacts 2022-12-01T11:07:46.2219882Z region: us-east-1 2022-12-01T11:07:46.2220111Z env: 2022-12-01T11:07:46.2220348Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:46.2220595Z GPU_FLAG: --gpus all 2022-12-01T11:07:46.2220840Z ##[endgroup] 2022-12-01T11:07:46.6672496Z NOTE: s3-prefix specified, ignoring name parameter 2022-12-01T11:07:46.6673789Z With the provided path, there will be 1 file uploaded 2022-12-01T11:07:46.6674313Z Uploading to s3 prefix: pytorch/pytorch/3591403534/1/artifact 2022-12-01T11:07:46.6685293Z Starting upload of test-reports-test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637.zip 2022-12-01T11:07:46.9191953Z Finished upload of test-reports-test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637.zip 2022-12-01T11:07:46.9402278Z ##[group]Run seemethere/upload-artifact-s3@v5 2022-12-01T11:07:46.9402917Z with: 2022-12-01T11:07:46.9403184Z s3-prefix: pytorch/pytorch/3591403534/1/artifact 2022-12-01T11:07:46.9403484Z retention-days: 14 2022-12-01T11:07:46.9403755Z if-no-files-found: ignore 2022-12-01T11:07:46.9404014Z path: usage-log-*.zip 2022-12-01T11:07:46.9404265Z name: artifact 2022-12-01T11:07:46.9404517Z s3-bucket: gha-artifacts 2022-12-01T11:07:46.9404761Z region: us-east-1 2022-12-01T11:07:46.9404986Z env: 2022-12-01T11:07:46.9405222Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:46.9405466Z GPU_FLAG: --gpus all 2022-12-01T11:07:46.9405725Z ##[endgroup] 2022-12-01T11:07:47.3828908Z NOTE: s3-prefix specified, ignoring name parameter 2022-12-01T11:07:47.3829973Z With the provided path, there will be 1 file uploaded 2022-12-01T11:07:47.3830354Z Uploading to s3 prefix: pytorch/pytorch/3591403534/1/artifact 2022-12-01T11:07:47.3841921Z Starting upload of usage-log-test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637.zip 2022-12-01T11:07:47.5774605Z Finished upload of usage-log-test-distributed-3-3-linux.8xlarge.nvidia.gpu_9818608637.zip 2022-12-01T11:07:47.5976236Z ##[group]Run set -x 2022-12-01T11:07:47.5976541Z set -x 2022-12-01T11:07:47.5976824Z python3 -m pip install -r requirements.txt 2022-12-01T11:07:47.5977157Z python3 -m pip install boto3==1.19.12 2022-12-01T11:07:47.5977548Z python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2022-12-01T11:07:47.5991058Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:07:47.5991371Z env: 2022-12-01T11:07:47.5991615Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:07:47.5991864Z GPU_FLAG: --gpus all 2022-12-01T11:07:47.5992132Z AWS_DEFAULT_REGION: us-east-1 2022-12-01T11:07:47.5992409Z BRANCH: pull/89997 2022-12-01T11:07:47.5992646Z TEST_CONFIG: distributed 2022-12-01T11:07:47.5992895Z SHARD_NUMBER: 3 2022-12-01T11:07:47.5993213Z BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T11:07:47.5993514Z PR_NUMBER: 89997 2022-12-01T11:07:47.5993776Z PYTORCH_RETRY_TEST_CASES: 1 2022-12-01T11:07:47.5994098Z PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 2022-12-01T11:07:47.5994400Z SHA1: c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T11:07:47.5994679Z TAG: 2022-12-01T11:07:47.5994907Z WORKFLOW_ID: 3591403534 2022-12-01T11:07:47.5995351Z GITHUB_TOKEN: *** 2022-12-01T11:07:47.5995602Z GHA_WORKFLOW_JOB_ID: 9818608637 2022-12-01T11:07:47.5995861Z ##[endgroup] 2022-12-01T11:07:47.6026289Z + python3 -m pip install -r requirements.txt 2022-12-01T11:07:47.9002030Z Defaulting to user installation because normal site-packages is not writeable 2022-12-01T11:07:47.9343078Z Requirement already satisfied: astunparse in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.6.3) 2022-12-01T11:07:47.9380541Z Requirement already satisfied: expecttest in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.1.4) 2022-12-01T11:07:47.9392187Z Requirement already satisfied: future in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.18.2) 2022-12-01T11:07:47.9406514Z Requirement already satisfied: hypothesis in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (6.58.2) 2022-12-01T11:07:47.9957904Z Requirement already satisfied: numpy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (1.21.6) 2022-12-01T11:07:47.9970611Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (5.9.1) 2022-12-01T11:07:48.0086224Z Requirement already satisfied: pyyaml in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (6.0) 2022-12-01T11:07:48.0097556Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (2.26.0) 2022-12-01T11:07:48.0351525Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (49.1.3) 2022-12-01T11:07:48.0602115Z Requirement already satisfied: six in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (1.16.0) 2022-12-01T11:07:48.0614873Z Requirement already satisfied: types-dataclasses in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (0.6.6) 2022-12-01T11:07:48.0623390Z Requirement already satisfied: typing_extensions in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 13)) (4.4.0) 2022-12-01T11:07:48.0637565Z Requirement already satisfied: sympy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 14)) (1.10.1) 2022-12-01T11:07:48.0667431Z Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from astunparse->-r requirements.txt (line 2)) (0.38.4) 2022-12-01T11:07:48.0690955Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (2.4.0) 2022-12-01T11:07:48.0704961Z Requirement already satisfied: attrs>=19.2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (22.1.0) 2022-12-01T11:07:48.1081575Z Requirement already satisfied: exceptiongroup>=1.0.0; python_version < "3.11" in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (1.0.4) 2022-12-01T11:07:48.1108043Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2022.9.24) 2022-12-01T11:07:48.1120410Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (1.26.13) 2022-12-01T11:07:48.1347586Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (3.4) 2022-12-01T11:07:48.1364675Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2.0.12) 2022-12-01T11:07:48.1390644Z Requirement already satisfied: mpmath>=0.19 in /home/ec2-user/.local/lib/python3.7/site-packages (from sympy->-r requirements.txt (line 14)) (1.2.1) 2022-12-01T11:07:48.2503964Z + python3 -m pip install boto3==1.19.12 2022-12-01T11:07:48.5462082Z Defaulting to user installation because normal site-packages is not writeable 2022-12-01T11:07:48.5692630Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12) 2022-12-01T11:07:48.5764280Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12) 2022-12-01T11:07:48.5825459Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0) 2022-12-01T11:07:48.5853070Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2) 2022-12-01T11:07:48.5889152Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2) 2022-12-01T11:07:48.5917745Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.13) 2022-12-01T11:07:48.6143702Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0) 2022-12-01T11:07:48.8641335Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2022-12-01T11:08:00.8313852Z [scribe] Scribe access token not provided, sending report via boto3... 2022-12-01T11:08:00.8314354Z 2022-12-01T11:08:00.8314889Z ----- Historic stats comparison result ------ 2022-12-01T11:08:00.8315314Z 2022-12-01T11:08:00.8315819Z job: linux-bionic-cuda11.6-py3.10-gcc7 2022-12-01T11:08:00.8316524Z commit: c13d400bffe90e16b96520bbc8a41a6f0c9cd584 2022-12-01T11:08:00.8316864Z 2022-12-01T11:08:00.8317108Z Commit graph (base is most recent master ancestor with at least one S3 report): 2022-12-01T11:08:00.8317358Z 2022-12-01T11:08:00.8317444Z : (master) 2022-12-01T11:08:00.8317681Z | 2022-12-01T11:08:00.8317949Z | * c13d400bff (HEAD) total time 2365.28s 2022-12-01T11:08:00.8318198Z | | 2022-12-01T11:08:00.8318434Z | : (59 commits) 2022-12-01T11:08:00.8318658Z |/ 2022-12-01T11:08:00.8319186Z * 67eb2d5952 (base) 18 reports, total time 3503.90s ± 2079.90s 2022-12-01T11:08:00.8319630Z * 1c5ca724f4 9 reports, total time 3529.71s ± 2080.27s 2022-12-01T11:08:00.8320052Z * 9d6109c4b0 9 reports, total time 3530.22s ± 2110.41s 2022-12-01T11:08:00.8320803Z * 736adc0808 9 reports, total time 3501.80s ± 2137.81s 2022-12-01T11:08:00.8321246Z * a348975e00 9 reports, total time 3551.46s ± 2118.11s 2022-12-01T11:08:00.8321663Z * db13049b88 9 reports, total time 3520.57s ± 2066.49s 2022-12-01T11:08:00.8322080Z * d07b85393a 9 reports, total time 3540.86s ± 2074.85s 2022-12-01T11:08:00.8322907Z * ac25c210e5 9 reports, total time 3549.18s ± 2105.70s 2022-12-01T11:08:00.8323376Z * 2355b6256b 9 reports, total time 3596.87s ± 2152.28s 2022-12-01T11:08:00.8326091Z * 4f95f7ae9b 9 reports, total time 3534.00s ± 2151.51s 2022-12-01T11:08:00.8326400Z | 2022-12-01T11:08:00.8326595Z : 2022-12-01T11:08:00.8326732Z 2022-12-01T11:08:00.8326898Z Removed (across 908 suites) 0 tests, totaling 0.00s 2022-12-01T11:08:00.8327250Z Modified (across 0 suites) 0 tests, totaling 0.00s 2022-12-01T11:08:00.8327598Z Added (across 70 suites) 603 tests, totaling +2365.28s 2022-12-01T11:08:00.8950425Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2022-12-01T11:08:00.8950795Z with: 2022-12-01T11:08:00.8950991Z env: 2022-12-01T11:08:00.8951229Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:08:00.8951491Z GPU_FLAG: --gpus all 2022-12-01T11:08:00.8951721Z ##[endgroup] 2022-12-01T11:08:00.8972260Z ##[group]Run set -eou pipefail 2022-12-01T11:08:00.8972575Z set -eou pipefail 2022-12-01T11:08:00.8972842Z  2022-12-01T11:08:00.8973163Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2022-12-01T11:08:00.8973485Z for _ in $(seq 1440); do 2022-12-01T11:08:00.8973788Z  # Break if no ssh session exists anymore 2022-12-01T11:08:00.8974084Z  if [ "$(who)" = "" ]; then 2022-12-01T11:08:00.8974314Z  break 2022-12-01T11:08:00.8974544Z  fi 2022-12-01T11:08:00.8974767Z  echo "." 2022-12-01T11:08:00.8975003Z  sleep 5 2022-12-01T11:08:00.8975254Z done 2022-12-01T11:08:00.8988517Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:08:00.8988800Z env: 2022-12-01T11:08:00.8989039Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:08:00.8989303Z GPU_FLAG: --gpus all 2022-12-01T11:08:00.8989530Z ##[endgroup] 2022-12-01T11:08:00.9019896Z Holding runner for 2 hours until all ssh sessions have logged out 2022-12-01T11:08:00.9070132Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2022-12-01T11:08:00.9070697Z # ignore expansion of "docker ps -q" since it could be empty 2022-12-01T11:08:00.9071041Z # shellcheck disable=SC2046 2022-12-01T11:08:00.9071350Z docker stop $(docker ps -q) || true 2022-12-01T11:08:00.9071647Z # Prune all of the docker images 2022-12-01T11:08:00.9071938Z docker system prune -af 2022-12-01T11:08:00.9084635Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-12-01T11:08:00.9084939Z env: 2022-12-01T11:08:00.9085186Z GIT_DEFAULT_BRANCH: master 2022-12-01T11:08:00.9085437Z GPU_FLAG: --gpus all 2022-12-01T11:08:00.9085683Z ##[endgroup] 2022-12-01T11:08:01.4664846Z 66307f3ad701 2022-12-01T11:08:02.2509502Z Deleted Containers: 2022-12-01T11:08:02.2509987Z 66307f3ad701b5e8a32dadc4f1fd99633efca32cc1fa29cf0780fcc5b09884b4 2022-12-01T11:08:02.2510244Z 2022-12-01T11:08:07.3705780Z Deleted Images: 2022-12-01T11:08:07.3706938Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fa72f5a0a230eb632055220542038bd4ceca184b 2022-12-01T11:08:07.3708216Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7@sha256:217fd7de680e1dd5bca4b2b4054bd05a8d454df5e210ffbf1e5955e01cf1f340 2022-12-01T11:08:07.3708854Z deleted: sha256:f1c05e1c4a24b2f30e4b2dd175e18efaf97173a54d9813ad2e6088a740690342 2022-12-01T11:08:07.3709554Z deleted: sha256:1e7def21582cf03150a2e3ad04e2701388eaa422b3720fe7f648241a8988388e 2022-12-01T11:08:07.3710006Z deleted: sha256:3a7e72ba714442f54696c557b74dca88c618446caa68ff6506c2ce93cd7b2a85 2022-12-01T11:08:07.3710630Z deleted: sha256:acc47a47b0a4383dfbd2bd91be28c265a6486f388812591a0a9c185c62047c0a 2022-12-01T11:08:07.3711118Z deleted: sha256:60f22f4a0a4afdbef80bed81f20ae4c7a54233808a9544253a3c30a2776d2f71 2022-12-01T11:08:07.3711649Z deleted: sha256:5efa4324ff86638bcfa35ab9b1debaddf3d7513732c27ca8dd7e60e8358da6c9 2022-12-01T11:08:07.3712328Z deleted: sha256:a7f286562f5d9f9a1490bbc0a8125ba19aefca94af08516be2493eeb191f127f 2022-12-01T11:08:07.3712778Z deleted: sha256:e403df4fa1d82e23b8db68b6e2b5562fcd55535dd72214b9421e7814eee51ffd 2022-12-01T11:08:07.3713450Z deleted: sha256:f9471dd8edb7c2e1eb3aefda26ec6e67ee49347262f97f3111f176eeb2bf9465 2022-12-01T11:08:07.3713909Z deleted: sha256:d6df7f8a51a6e7483b61b1f330f3a4417b815fce5a7ad4c2a89510a54f5537da 2022-12-01T11:08:07.3714410Z deleted: sha256:b12c66528bc64f807b7a52116ecdf950025b66a403dd328a33db5511b63fc3f1 2022-12-01T11:08:07.3715187Z deleted: sha256:e8a84d179a24426205a63771d9aa894a6889147decc6d227312415e2842d0e38 2022-12-01T11:08:07.3715642Z deleted: sha256:b1e802e06103bb69d2a413954707d2f281a894d36a1f7fa3a65efa626a0e5e43 2022-12-01T11:08:07.3716313Z deleted: sha256:8efc9a5c3ed0496241d3ce1f76339b4d1270b5470ce1429effc703ca61a04c50 2022-12-01T11:08:07.3716741Z deleted: sha256:fef4f8c199945077358334d285e9dbe762a116427ec29d567139bee51863bef4 2022-12-01T11:08:07.3717287Z deleted: sha256:3f9f9d5fa745067d0ddf7f1206fcf51b87304adb8d32c87e130cbaae4acee275 2022-12-01T11:08:07.3717843Z deleted: sha256:75fb962b88e95b55a260c8cebb5da299635f32f3b104d08b08a2e1fba788437b 2022-12-01T11:08:07.3718283Z deleted: sha256:bd2c58efdd30f071840e8b3fd569126cf238bfdd5f0cd18345e36b60f93489df 2022-12-01T11:08:07.3718962Z deleted: sha256:d72aae1f9d4b7e1acc7b1954fed6c4886c49cec893a48dac5337fee3c9bf4c08 2022-12-01T11:08:07.3719416Z deleted: sha256:2b0461664863c4ef4ad02da7c9ec92a1d6d08bc8a9b76b5a982b7a98549980d3 2022-12-01T11:08:07.3719988Z deleted: sha256:086197a6cf735060ac5a6e595333f0bc14643bfddd3c338439c22002c1bec257 2022-12-01T11:08:07.3720544Z deleted: sha256:64254e4d362f949e9732531c579ca041aa47d4f70f7c242fda19048d82df0a1d 2022-12-01T11:08:07.3720945Z deleted: sha256:25374ed655bf3ce154d4ee14bd8c4f3220431d67b0d18acf3efdd666bbc01285 2022-12-01T11:08:07.3721624Z deleted: sha256:70a8b1a97ab5caea7ba2d2ecea43b58781672c588ee6375271d2b5f04d31594c 2022-12-01T11:08:07.3722074Z deleted: sha256:758af05d13f6edc84594db4db15f7b8066c5da10f6d83717a1d8b6ae5a2caccf 2022-12-01T11:08:07.3723061Z deleted: sha256:b37c30a9368604ae9d1873491a752ae71aba89de73dec5588638c1d2e3917901 2022-12-01T11:08:07.3723489Z deleted: sha256:790f109bf6ce127a4f3e25ea1675387993204350100fca6cecb56f8b863ce2e5 2022-12-01T11:08:07.3724008Z deleted: sha256:df13ec8c62d036e4b9fe0fd768e7fae7c5f76430b4aae4f3f356d701621b86d8 2022-12-01T11:08:07.3724629Z deleted: sha256:ef027ba5732bd4bc180cc8f8d769cd28630e92609a88be8f472e4c8ee8a3bb6d 2022-12-01T11:08:07.3725045Z deleted: sha256:5751424dd4ee3deaaf3e9f8b8172683ff7f30d90dc2415c3833042333f52b619 2022-12-01T11:08:07.3725473Z deleted: sha256:ac0364f6b4441a779e12c1527f256b052fff92b14aca383c87e62a8d6e4d1470 2022-12-01T11:08:07.3726130Z deleted: sha256:1f8161738ca37b8a92ed2577a6d9705167fb325ad8f8512439b47251013af048 2022-12-01T11:08:07.3726890Z deleted: sha256:2f8770a5227a4d66f9bdea07ca3e6e7cda5b06fbc362b354ae29f7542b3e7a19 2022-12-01T11:08:07.3727493Z deleted: sha256:652470cd61eae170df59d88a3b038cd2ab199526f4ba54badf85dd5628d7cb37 2022-12-01T11:08:07.3727948Z deleted: sha256:155eec0e8b217d2bafa06b3f85e987c3164bbad427de092045ddbff33f091a1f 2022-12-01T11:08:07.3728387Z deleted: sha256:8c60c1d76e4c0216ef0cc9248e31a656ab0a29bc95f589f4b23a0634523058c7 2022-12-01T11:08:07.3728783Z deleted: sha256:2811ebe06c45b7997659f3d50562406d001f3e979f398f9b4844bac84b3c419e 2022-12-01T11:08:07.3729207Z deleted: sha256:6a690d8e14cafcab7ab578a42baf2ca075033cf306f03e13ae70150b8be2e586 2022-12-01T11:08:07.3729650Z deleted: sha256:635a5f6faf92c02fc91f0a365e65bb840b74b3b7c079ee1009e6c15cd7c0e7b4 2022-12-01T11:08:07.3730111Z deleted: sha256:2696ecc06a0fc00b7ef534ef78758030c9d69f082f8fcb2f5ab275c6fc199e7c 2022-12-01T11:08:07.3730560Z deleted: sha256:cec67903bb9f9cc16c8826dff0d76c876d88147fecd2dec21ebc0adc56e4f149 2022-12-01T11:08:07.3730994Z deleted: sha256:6d95f163e75e744dc79893bc297644cf2a70187540d68c847baeb77483c6a0f5 2022-12-01T11:08:07.3731426Z deleted: sha256:1b1204e913bbc6e599fc1cdcabede5cd1315dc1b55cd67052bc665a8f8e3cc1a 2022-12-01T11:08:07.3731843Z deleted: sha256:93670f5e92be335224cf3a8b69a526306d2fec5e5e4e42c272b8c0458060036a 2022-12-01T11:08:07.3732258Z deleted: sha256:a0b1388f987804a153c13745107eb2a30de9cc6dc9eab87ae191707765dd7175 2022-12-01T11:08:07.3732670Z deleted: sha256:129bdb873e79117f4e90135f0c6a58f775fcf596f4eb514b803771cef2da8278 2022-12-01T11:08:07.3733099Z deleted: sha256:2d49e3a81bd436bfd20fb4a849cdc98da82cb74afef3de38dda7a946d3fc4153 2022-12-01T11:08:07.3733549Z deleted: sha256:0ba4e259108e5311ddf6b79ae3a35f8f16a4004ef8817e50427baa3cc90ac081 2022-12-01T11:08:07.3734109Z deleted: sha256:c164403226561914f16becdeca65c54d20dba8dad414b062efc34c05c47bf725 2022-12-01T11:08:07.3734542Z deleted: sha256:cbe4006b2e6286d50c1b292fb71b69d5299d65f055285519eafc41eac3ef8a3c 2022-12-01T11:08:07.3734978Z deleted: sha256:edcec18dceb25f1a03ec20de4676464613e69072875a83f5c45e45a31aafc5b9 2022-12-01T11:08:07.3735401Z deleted: sha256:13c4f317ac4bb48997302756b8d5f8b602e835607c9806a1a5b200e9a0657d8a 2022-12-01T11:08:07.3735812Z deleted: sha256:57f043e380f4586c76968d6e062b50bac55254a5be7e80bea3c027a5bb316469 2022-12-01T11:08:07.3736209Z deleted: sha256:3e549931e0240b9aac25dc79ed6a6259863879a5c9bd20755f77cac27c1ab8c8 2022-12-01T11:08:07.3736444Z 2022-12-01T11:08:07.3736590Z Total reclaimed space: 18.87GB 2022-12-01T11:08:07.3791303Z Post job cleanup. 2022-12-01T11:08:07.3831920Z Post job cleanup. 2022-12-01T11:08:07.5214669Z [command]/usr/bin/git version 2022-12-01T11:08:07.5269585Z git version 2.37.1 2022-12-01T11:08:07.5334241Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f8465d8e-1f7b-4590-b4c6-b5cb8b62e607' before making global git config changes 2022-12-01T11:08:07.5334812Z Adding repository directory to the temporary git global config as a safe directory 2022-12-01T11:08:07.5341244Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-12-01T11:08:07.5386340Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2022-12-01T11:08:07.5423163Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2022-12-01T11:08:07.5751094Z Entering 'android/libs/fbjni' 2022-12-01T11:08:07.5793329Z Entering 'third_party/FP16' 2022-12-01T11:08:07.5836463Z Entering 'third_party/FXdiv' 2022-12-01T11:08:07.5878958Z Entering 'third_party/NNPACK' 2022-12-01T11:08:07.5922181Z Entering 'third_party/QNNPACK' 2022-12-01T11:08:07.5964611Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T11:08:07.6007752Z Entering 'third_party/XNNPACK' 2022-12-01T11:08:07.6064321Z Entering 'third_party/benchmark' 2022-12-01T11:08:07.6105725Z Entering 'third_party/cpuinfo' 2022-12-01T11:08:07.6148491Z Entering 'third_party/cub' 2022-12-01T11:08:07.6189848Z Entering 'third_party/cudnn_frontend' 2022-12-01T11:08:07.6238547Z Entering 'third_party/cutlass' 2022-12-01T11:08:07.6289857Z Entering 'third_party/eigen' 2022-12-01T11:08:07.6334939Z Entering 'third_party/fbgemm' 2022-12-01T11:08:07.6376935Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T11:08:07.6418245Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T11:08:07.6460683Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T11:08:07.6503278Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T11:08:07.6546521Z Entering 'third_party/flatbuffers' 2022-12-01T11:08:07.6593526Z Entering 'third_party/fmt' 2022-12-01T11:08:07.6635090Z Entering 'third_party/foxi' 2022-12-01T11:08:07.6677508Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T11:08:07.6719083Z Entering 'third_party/gloo' 2022-12-01T11:08:07.6760778Z Entering 'third_party/googletest' 2022-12-01T11:08:07.6802042Z Entering 'third_party/ideep' 2022-12-01T11:08:07.6845347Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T11:08:07.6889173Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T11:08:07.6938157Z Entering 'third_party/ios-cmake' 2022-12-01T11:08:07.6979849Z Entering 'third_party/ittapi' 2022-12-01T11:08:07.7022545Z Entering 'third_party/kineto' 2022-12-01T11:08:07.7064543Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T11:08:07.7106739Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T11:08:07.7150248Z Entering 'third_party/nccl/nccl' 2022-12-01T11:08:07.7193229Z Entering 'third_party/neon2sse' 2022-12-01T11:08:07.7235921Z Entering 'third_party/nlohmann' 2022-12-01T11:08:07.7279430Z Entering 'third_party/onnx' 2022-12-01T11:08:07.7338137Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T11:08:07.7381635Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T11:08:07.7429208Z Entering 'third_party/onnx-tensorrt' 2022-12-01T11:08:07.7472156Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T11:08:07.7520456Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T11:08:07.7563073Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T11:08:07.7605969Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T11:08:07.7653186Z Entering 'third_party/pocketfft' 2022-12-01T11:08:07.7696223Z Entering 'third_party/protobuf' 2022-12-01T11:08:07.7742723Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T11:08:07.7785112Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T11:08:07.7829156Z Entering 'third_party/psimd' 2022-12-01T11:08:07.7871285Z Entering 'third_party/pthreadpool' 2022-12-01T11:08:07.7913665Z Entering 'third_party/pybind11' 2022-12-01T11:08:07.7956489Z Entering 'third_party/python-enum' 2022-12-01T11:08:07.7998684Z Entering 'third_party/python-peachpy' 2022-12-01T11:08:07.8041607Z Entering 'third_party/python-six' 2022-12-01T11:08:07.8084251Z Entering 'third_party/sleef' 2022-12-01T11:08:07.8127079Z Entering 'third_party/tbb' 2022-12-01T11:08:07.8172278Z Entering 'third_party/tensorpipe' 2022-12-01T11:08:07.8215269Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T11:08:07.8256862Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T11:08:07.8298352Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T11:08:07.8341241Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T11:08:07.8383405Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T11:08:07.8428590Z Entering 'third_party/zstd' 2022-12-01T11:08:07.8490456Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2022-12-01T11:08:07.8522278Z http.https://github.com/.extraheader 2022-12-01T11:08:07.8532816Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2022-12-01T11:08:07.8570753Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2022-12-01T11:08:07.8893260Z Entering 'android/libs/fbjni' 2022-12-01T11:08:07.8917844Z http.https://github.com/.extraheader 2022-12-01T11:08:07.8951524Z Entering 'third_party/FP16' 2022-12-01T11:08:07.8977585Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9010358Z Entering 'third_party/FXdiv' 2022-12-01T11:08:07.9035154Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9068530Z Entering 'third_party/NNPACK' 2022-12-01T11:08:07.9093471Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9127147Z Entering 'third_party/QNNPACK' 2022-12-01T11:08:07.9151552Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9184892Z Entering 'third_party/VulkanMemoryAllocator' 2022-12-01T11:08:07.9210537Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9242987Z Entering 'third_party/XNNPACK' 2022-12-01T11:08:07.9268008Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9312675Z Entering 'third_party/benchmark' 2022-12-01T11:08:07.9337938Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9372147Z Entering 'third_party/cpuinfo' 2022-12-01T11:08:07.9397798Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9431072Z Entering 'third_party/cub' 2022-12-01T11:08:07.9455725Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9489246Z Entering 'third_party/cudnn_frontend' 2022-12-01T11:08:07.9513783Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9553749Z Entering 'third_party/cutlass' 2022-12-01T11:08:07.9578603Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9618835Z Entering 'third_party/eigen' 2022-12-01T11:08:07.9643632Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9678399Z Entering 'third_party/fbgemm' 2022-12-01T11:08:07.9704337Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9737517Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-12-01T11:08:07.9761996Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9794898Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-12-01T11:08:07.9820604Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9854794Z Entering 'third_party/fbgemm/third_party/googletest' 2022-12-01T11:08:07.9878720Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9912820Z Entering 'third_party/fbgemm/third_party/hipify_torch' 2022-12-01T11:08:07.9938752Z http.https://github.com/.extraheader 2022-12-01T11:08:07.9973265Z Entering 'third_party/flatbuffers' 2022-12-01T11:08:07.9997578Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0032311Z Entering 'third_party/fmt' 2022-12-01T11:08:08.0057798Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0090549Z Entering 'third_party/foxi' 2022-12-01T11:08:08.0114750Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0147171Z Entering 'third_party/gemmlowp/gemmlowp' 2022-12-01T11:08:08.0173135Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0205924Z Entering 'third_party/gloo' 2022-12-01T11:08:08.0230804Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0264747Z Entering 'third_party/googletest' 2022-12-01T11:08:08.0289925Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0323980Z Entering 'third_party/ideep' 2022-12-01T11:08:08.0348666Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0380690Z Entering 'third_party/ideep/mkl-dnn' 2022-12-01T11:08:08.0405672Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0439932Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-12-01T11:08:08.0464655Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0504243Z Entering 'third_party/ios-cmake' 2022-12-01T11:08:08.0529384Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0561563Z Entering 'third_party/ittapi' 2022-12-01T11:08:08.0587355Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0620243Z Entering 'third_party/kineto' 2022-12-01T11:08:08.0645151Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0677754Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-12-01T11:08:08.0703309Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0736378Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-12-01T11:08:08.0761546Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0796279Z Entering 'third_party/nccl/nccl' 2022-12-01T11:08:08.0821623Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0854324Z Entering 'third_party/neon2sse' 2022-12-01T11:08:08.0878761Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0911365Z Entering 'third_party/nlohmann' 2022-12-01T11:08:08.0936175Z http.https://github.com/.extraheader 2022-12-01T11:08:08.0970568Z Entering 'third_party/onnx' 2022-12-01T11:08:08.0995506Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1042164Z Entering 'third_party/onnx/third_party/benchmark' 2022-12-01T11:08:08.1068111Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1101336Z Entering 'third_party/onnx/third_party/pybind11' 2022-12-01T11:08:08.1125363Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1160268Z Entering 'third_party/onnx-tensorrt' 2022-12-01T11:08:08.1185567Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1217419Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-12-01T11:08:08.1242093Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1281275Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-12-01T11:08:08.1306095Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1339768Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-12-01T11:08:08.1364470Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1397090Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-12-01T11:08:08.1423009Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1462403Z Entering 'third_party/pocketfft' 2022-12-01T11:08:08.1487852Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1519965Z Entering 'third_party/protobuf' 2022-12-01T11:08:08.1544911Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1581267Z Entering 'third_party/protobuf/third_party/benchmark' 2022-12-01T11:08:08.1605978Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1638062Z Entering 'third_party/protobuf/third_party/googletest' 2022-12-01T11:08:08.1663487Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1698030Z Entering 'third_party/psimd' 2022-12-01T11:08:08.1723146Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1755393Z Entering 'third_party/pthreadpool' 2022-12-01T11:08:08.1780168Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1812506Z Entering 'third_party/pybind11' 2022-12-01T11:08:08.1837388Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1870216Z Entering 'third_party/python-enum' 2022-12-01T11:08:08.1895551Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1928545Z Entering 'third_party/python-peachpy' 2022-12-01T11:08:08.1952832Z http.https://github.com/.extraheader 2022-12-01T11:08:08.1985850Z Entering 'third_party/python-six' 2022-12-01T11:08:08.2010999Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2044790Z Entering 'third_party/sleef' 2022-12-01T11:08:08.2069282Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2102538Z Entering 'third_party/tbb' 2022-12-01T11:08:08.2127084Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2161592Z Entering 'third_party/tensorpipe' 2022-12-01T11:08:08.2187239Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2220780Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-12-01T11:08:08.2244838Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2277478Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-12-01T11:08:08.2302041Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2334556Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-12-01T11:08:08.2358832Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2391916Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-12-01T11:08:08.2416133Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2449102Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-12-01T11:08:08.2473377Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2509254Z Entering 'third_party/zstd' 2022-12-01T11:08:08.2534353Z http.https://github.com/.extraheader 2022-12-01T11:08:08.2846079Z Cleaning up orphan processes