2022-05-18T03:57:55.2198300Z Requested labels: linux.8xlarge.nvidia.gpu 2022-05-18T03:57:55.2198397Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/master 2022-05-18T03:57:55.2198422Z Waiting for a runner to pick up this job... 2022-05-18T03:57:57.5449322Z Job is about to start running on the runner: i-023c3009b9c09a97d (repository) 2022-05-18T03:58:03.6870305Z Current runner version: '2.291.1' 2022-05-18T03:58:03.6878105Z Runner name: 'i-023c3009b9c09a97d' 2022-05-18T03:58:03.6878927Z Runner group name: 'Default' 2022-05-18T03:58:03.6879791Z Machine name: 'ip-10-0-4-191' 2022-05-18T03:58:03.6882481Z ##[group]GITHUB_TOKEN Permissions 2022-05-18T03:58:03.6883512Z Actions: write 2022-05-18T03:58:03.6884000Z Checks: write 2022-05-18T03:58:03.6884379Z Contents: write 2022-05-18T03:58:03.6884820Z Deployments: write 2022-05-18T03:58:03.6885281Z Discussions: write 2022-05-18T03:58:03.6885702Z Issues: write 2022-05-18T03:58:03.6886127Z Metadata: read 2022-05-18T03:58:03.6886564Z Packages: write 2022-05-18T03:58:03.6886953Z Pages: write 2022-05-18T03:58:03.6887410Z PullRequests: write 2022-05-18T03:58:03.6887963Z RepositoryProjects: write 2022-05-18T03:58:03.6888397Z SecurityEvents: write 2022-05-18T03:58:03.6888845Z Statuses: write 2022-05-18T03:58:03.6889298Z ##[endgroup] 2022-05-18T03:58:03.6893504Z Secret source: Actions 2022-05-18T03:58:03.6894481Z Prepare workflow directory 2022-05-18T03:58:03.9784460Z Prepare all required actions 2022-05-18T03:58:04.0020499Z Getting action download info 2022-05-18T03:58:04.1888443Z Download action repository 'pytorch/pytorch@master' (SHA:7b8cf1f7366bff95e9954037a58a8bb0edaaebd3) 2022-05-18T03:58:07.2010001Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a) 2022-05-18T03:58:07.3028797Z Download action repository 'seemethere/upload-artifact-s3@v4' (SHA:c1c31f57581a11fe6d4d052da6276adb2df71f1e) 2022-05-18T03:58:07.5864487Z Getting action download info 2022-05-18T03:58:07.7207582Z Download action repository 'malfet/checkout@silent-checkout' (SHA:f63e9e15406be6060f159846cd2e098f759c5246) 2022-05-18T03:58:07.9077085Z Getting action download info 2022-05-18T03:58:08.1600911Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master 2022-05-18T03:58:08.1601299Z with: 2022-05-18T03:58:08.1601558Z submodules: recursive 2022-05-18T03:58:08.1601820Z fetch-depth: 0 2022-05-18T03:58:08.1602064Z env: 2022-05-18T03:58:08.1602288Z IN_CI: 1 2022-05-18T03:58:08.1602513Z IS_GHA: 1 2022-05-18T03:58:08.1602748Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:58:08.1603035Z ##[endgroup] 2022-05-18T03:58:08.1897073Z ##[group]Run echo "${GITHUB_WORKSPACE}" 2022-05-18T03:58:08.1897443Z echo "${GITHUB_WORKSPACE}" 2022-05-18T03:58:08.1897756Z if [ -z "${NO_SUDO}" ]; then 2022-05-18T03:58:08.1898069Z  sudo rm -rf "${GITHUB_WORKSPACE}" 2022-05-18T03:58:08.1898325Z else 2022-05-18T03:58:08.1898586Z  rm -rf "${GITHUB_WORKSPACE}" 2022-05-18T03:58:08.1898846Z fi 2022-05-18T03:58:08.1899099Z mkdir "${GITHUB_WORKSPACE}" 2022-05-18T03:58:08.1918281Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T03:58:08.1918618Z env: 2022-05-18T03:58:08.1918827Z IN_CI: 1 2022-05-18T03:58:08.1919054Z IS_GHA: 1 2022-05-18T03:58:08.1919303Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:58:08.1919545Z NO_SUDO: 2022-05-18T03:58:08.1919782Z ##[endgroup] 2022-05-18T03:58:08.2141102Z /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T03:58:10.5990707Z ##[group]Run malfet/checkout@silent-checkout 2022-05-18T03:58:10.5991067Z with: 2022-05-18T03:58:10.5991345Z ref: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T03:58:10.5991618Z fetch-depth: 0 2022-05-18T03:58:10.5991879Z submodules: recursive 2022-05-18T03:58:10.5992144Z quiet-checkout: true 2022-05-18T03:58:10.5992406Z repository: pytorch/pytorch 2022-05-18T03:58:10.5992861Z token: *** 2022-05-18T03:58:10.5993110Z ssh-strict: true 2022-05-18T03:58:10.5993382Z persist-credentials: true 2022-05-18T03:58:10.5993631Z clean: true 2022-05-18T03:58:10.5993861Z lfs: false 2022-05-18T03:58:10.5994130Z set-safe-directory: true 2022-05-18T03:58:10.5994366Z env: 2022-05-18T03:58:10.5994580Z IN_CI: 1 2022-05-18T03:58:10.5994800Z IS_GHA: 1 2022-05-18T03:58:10.5995030Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:58:10.5995288Z ##[endgroup] 2022-05-18T03:58:10.7520974Z Syncing repository: pytorch/pytorch 2022-05-18T03:58:10.7522860Z ##[group]Getting Git version info 2022-05-18T03:58:10.7523412Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2022-05-18T03:58:10.7524007Z [command]/usr/bin/git version 2022-05-18T03:58:10.7524283Z git version 2.32.0 2022-05-18T03:58:10.7533790Z ##[endgroup] 2022-05-18T03:58:10.7555613Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/1c01ef46-9b48-41ce-9a8f-1ad438d3877a' before making global git config changes 2022-05-18T03:58:10.7556211Z Adding repository directory to the temporary git global config as a safe directory 2022-05-18T03:58:10.7564483Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T03:58:10.7607542Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2022-05-18T03:58:10.7613236Z ##[group]Initializing the repository 2022-05-18T03:58:10.7619590Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T03:58:10.7654259Z hint: Using 'master' as the name for the initial branch. This default branch name 2022-05-18T03:58:10.7654734Z hint: is subject to change. To configure the initial branch name to use in all 2022-05-18T03:58:10.7655189Z hint: of your new repositories, which will suppress this warning, call: 2022-05-18T03:58:10.7655685Z hint: 2022-05-18T03:58:10.7656057Z hint: git config --global init.defaultBranch 2022-05-18T03:58:10.7656359Z hint: 2022-05-18T03:58:10.7656754Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2022-05-18T03:58:10.7657270Z hint: 'development'. The just-created branch can be renamed via this command: 2022-05-18T03:58:10.7657581Z hint: 2022-05-18T03:58:10.7658032Z hint: git branch -m 2022-05-18T03:58:10.7658772Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2022-05-18T03:58:10.7670689Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2022-05-18T03:58:10.7706286Z ##[endgroup] 2022-05-18T03:58:10.7706786Z ##[group]Disabling automatic garbage collection 2022-05-18T03:58:10.7712234Z [command]/usr/bin/git config --local gc.auto 0 2022-05-18T03:58:10.7744715Z ##[endgroup] 2022-05-18T03:58:10.7745173Z ##[group]Setting up auth 2022-05-18T03:58:10.7755261Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2022-05-18T03:58:10.7793635Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2022-05-18T03:58:10.8098753Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2022-05-18T03:58:10.8132256Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2022-05-18T03:58:10.8444864Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2022-05-18T03:58:10.8494151Z ##[endgroup] 2022-05-18T03:58:10.8494634Z ##[group]Fetching the repository 2022-05-18T03:58:10.8503273Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2022-05-18T03:58:51.9002269Z [command]/usr/bin/git rev-parse --verify --quiet 3b2375291aab7b48442f2e6fb1ef66cebc761e24^{object} 2022-05-18T03:58:51.9032857Z 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T03:58:51.9040744Z ##[endgroup] 2022-05-18T03:58:51.9041262Z ##[group]Determining the checkout info 2022-05-18T03:58:51.9041738Z ##[endgroup] 2022-05-18T03:58:51.9042194Z ##[group]Checking out the ref 2022-05-18T03:58:51.9047005Z [command]/usr/bin/git checkout --quiet --force 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T03:58:53.5154903Z ##[endgroup] 2022-05-18T03:58:53.5155709Z ##[group]Setting up auth for fetching submodules 2022-05-18T03:58:53.5163004Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2022-05-18T03:58:53.5221935Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2022-05-18T03:58:53.5256105Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2022-05-18T03:58:53.5289409Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2022-05-18T03:58:53.5321936Z ##[endgroup] 2022-05-18T03:58:53.5322375Z ##[group]Fetching submodules 2022-05-18T03:58:53.5328783Z [command]/usr/bin/git submodule sync --recursive 2022-05-18T03:58:53.5661514Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2022-05-18T03:58:53.5972306Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2022-05-18T03:58:53.5974599Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2022-05-18T03:58:53.5977485Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2022-05-18T03:58:53.5980532Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2022-05-18T03:58:53.5983945Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK' 2022-05-18T03:58:53.5987719Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2022-05-18T03:58:53.5991496Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2022-05-18T03:58:53.5995012Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2022-05-18T03:58:53.5998701Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub' 2022-05-18T03:58:53.6002829Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2022-05-18T03:58:53.6006714Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen' 2022-05-18T03:58:53.6010798Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2022-05-18T03:58:53.6015164Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2022-05-18T03:58:53.6019683Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2022-05-18T03:58:53.6024411Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi' 2022-05-18T03:58:53.6029587Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2022-05-18T03:58:53.6034446Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo' 2022-05-18T03:58:53.6039410Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2022-05-18T03:58:53.6044477Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2022-05-18T03:58:53.6049723Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake' 2022-05-18T03:58:53.6054957Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2022-05-18T03:58:53.6060506Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl' 2022-05-18T03:58:53.6067570Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse' 2022-05-18T03:58:53.6073349Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2022-05-18T03:58:53.6079204Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt' 2022-05-18T03:58:53.6085205Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2022-05-18T03:58:53.6091347Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2022-05-18T03:58:53.6097642Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2022-05-18T03:58:53.6104396Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2022-05-18T03:58:53.6111117Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2022-05-18T03:58:53.6117835Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum' 2022-05-18T03:58:53.6124611Z Submodule 'third_party/python-peachpy' (https://github.com/Maratyszcza/PeachPy.git) registered for path 'third_party/python-peachpy' 2022-05-18T03:58:53.6131410Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six' 2022-05-18T03:58:53.6138431Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2022-05-18T03:58:53.6146470Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb' 2022-05-18T03:58:53.6153659Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2022-05-18T03:58:53.6161323Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd' 2022-05-18T03:58:53.6224205Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2022-05-18T03:58:53.8494285Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2022-05-18T03:58:54.0330041Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2022-05-18T03:58:54.2066833Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2022-05-18T03:58:54.4573574Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'... 2022-05-18T03:58:54.7057573Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2022-05-18T03:58:58.3491077Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2022-05-18T03:58:58.6844058Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2022-05-18T03:58:59.1222830Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'... 2022-05-18T03:59:00.2523893Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2022-05-18T03:59:01.4668233Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'... 2022-05-18T03:59:06.5401006Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2022-05-18T03:59:07.0217448Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2022-05-18T03:59:07.9926620Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2022-05-18T03:59:08.9060822Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'... 2022-05-18T03:59:09.0858890Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2022-05-18T03:59:09.5112477Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2022-05-18T03:59:09.7773093Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2022-05-18T03:59:10.5675230Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2022-05-18T03:59:10.8816778Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'... 2022-05-18T03:59:11.0645319Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2022-05-18T03:59:12.7407226Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'... 2022-05-18T03:59:13.0761348Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'... 2022-05-18T03:59:13.4105073Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2022-05-18T03:59:14.5541953Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'... 2022-05-18T03:59:14.8902815Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2022-05-18T03:59:15.0831644Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2022-05-18T03:59:19.3829549Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2022-05-18T03:59:19.5711160Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2022-05-18T03:59:19.7658634Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2022-05-18T03:59:20.4198938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'... 2022-05-18T03:59:20.6148030Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2022-05-18T03:59:20.8840551Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'... 2022-05-18T03:59:21.1406981Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2022-05-18T03:59:21.6428639Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'... 2022-05-18T03:59:23.4762267Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2022-05-18T03:59:24.9701084Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'... 2022-05-18T03:59:26.5405728Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2022-05-18T03:59:26.5796416Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2022-05-18T03:59:26.6157106Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2022-05-18T03:59:26.6697195Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2022-05-18T03:59:26.7234935Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c' 2022-05-18T03:59:27.4915332Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba' 2022-05-18T03:59:27.5450042Z Submodule path 'third_party/benchmark': checked out 'e991355c02b93fe17713efe04cbc2e278e00fdbd' 2022-05-18T03:59:27.6936270Z Submodule path 'third_party/cpuinfo': checked out '5916273f79a21551890fd3d56fc5375a78d1598d' 2022-05-18T03:59:27.7615637Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4' 2022-05-18T03:59:28.1598766Z Submodule path 'third_party/cudnn_frontend': checked out '43709ab96c47e26eebcdac72f93f946d44ceffa8' 2022-05-18T03:59:28.4849688Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c' 2022-05-18T03:59:28.5658962Z Submodule path 'third_party/fbgemm': checked out '2e9be65810107a9595da717f95d21924b73be833' 2022-05-18T03:59:28.5709188Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit' 2022-05-18T03:59:28.5712127Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T03:59:28.5715328Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest' 2022-05-18T03:59:28.5760741Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'... 2022-05-18T03:59:29.2538586Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'... 2022-05-18T03:59:29.8009617Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'... 2022-05-18T03:59:30.6621560Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out '8b35b4cffb62ecb58a903bf91cb7537d7a672211' 2022-05-18T03:59:30.8102489Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3' 2022-05-18T03:59:30.9056944Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796' 2022-05-18T03:59:31.0374505Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa' 2022-05-18T03:59:31.1048032Z Submodule path 'third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2022-05-18T03:59:31.1411687Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad' 2022-05-18T03:59:31.2153039Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2022-05-18T03:59:31.2706688Z Submodule path 'third_party/gloo': checked out 'c22a5cfba94edf8ea4f53a174d38aa0c629d070f' 2022-05-18T03:59:31.3523910Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2022-05-18T03:59:31.3904577Z Submodule path 'third_party/ideep': checked out '02b17c5748c9349dcc586c359af800c684d9b1ab' 2022-05-18T03:59:31.3955731Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2022-05-18T03:59:31.4000578Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2022-05-18T03:59:36.4666472Z Submodule path 'third_party/ideep/mkl-dnn': checked out '888a87a954e4fddb4d81fd10858eb834f2441b46' 2022-05-18T03:59:36.4730531Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T03:59:36.4778439Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'... 2022-05-18T03:59:41.7977584Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out '52b5f107dd9cf10910aaa19cb47f3abf9b349815' 2022-05-18T03:59:41.8390053Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724' 2022-05-18T03:59:41.9772636Z Submodule path 'third_party/kineto': checked out 'b2b48c00c6e5bd8e807e2231adb229db6a1d1c22' 2022-05-18T03:59:41.9824853Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T03:59:41.9827941Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T03:59:41.9873421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2022-05-18T03:59:42.9072603Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2022-05-18T03:59:43.7736617Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd' 2022-05-18T03:59:43.8629574Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2022-05-18T03:59:43.9134853Z Submodule path 'third_party/nccl/nccl': checked out '7e515921295adaab72adf56ea71a0fafb0ecb5f3' 2022-05-18T03:59:43.9550704Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a' 2022-05-18T03:59:44.2826178Z Submodule path 'third_party/onnx': checked out '96046b8ccfb8e6fa82f6b2b34b3d56add2e8849c' 2022-05-18T03:59:44.2892355Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark' 2022-05-18T03:59:44.2895492Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2022-05-18T03:59:44.2955171Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'... 2022-05-18T03:59:44.6400569Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2022-05-18T03:59:45.3687931Z Submodule path 'third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2022-05-18T03:59:45.4329423Z Submodule path 'third_party/onnx/third_party/pybind11': checked out '59a2ac2745d8a57ac94c6accced73620d59fb844' 2022-05-18T03:59:45.4768130Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f' 2022-05-18T03:59:45.4817733Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T03:59:45.4860090Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'... 2022-05-18T03:59:46.8833551Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' 2022-05-18T03:59:46.8899535Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T03:59:46.8902428Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T03:59:46.8955267Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'... 2022-05-18T03:59:47.2675233Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'... 2022-05-18T03:59:47.9908167Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2022-05-18T03:59:48.0924259Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' 2022-05-18T03:59:48.0982827Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T03:59:48.1027972Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... 2022-05-18T03:59:48.3290418Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2022-05-18T03:59:48.3670923Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a' 2022-05-18T03:59:48.7104341Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2022-05-18T03:59:48.7155232Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2022-05-18T03:59:48.7157982Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2022-05-18T03:59:48.7208809Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2022-05-18T03:59:49.0624781Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2022-05-18T03:59:49.9108056Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2022-05-18T03:59:50.0178754Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2022-05-18T03:59:50.0546011Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2022-05-18T03:59:50.0934444Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413' 2022-05-18T03:59:50.1553924Z Submodule path 'third_party/pybind11': checked out '8de7772cc72daca8e947b79b83fea46214931604' 2022-05-18T03:59:50.1913011Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7' 2022-05-18T03:59:50.2517928Z Submodule path 'third_party/python-peachpy': checked out '07d8fde8ac45d7705129475c0f94ed8925b93473' 2022-05-18T03:59:50.2885338Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a' 2022-05-18T03:59:50.3673160Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff' 2022-05-18T03:59:50.5323956Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9' 2022-05-18T03:59:50.5921234Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e' 2022-05-18T03:59:50.5971185Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2022-05-18T03:59:50.5974567Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2022-05-18T03:59:50.5977640Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2022-05-18T03:59:50.5980951Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T03:59:50.6025278Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2022-05-18T03:59:51.4225196Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2022-05-18T03:59:51.7964657Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2022-05-18T03:59:52.8150864Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2022-05-18T03:59:53.5647395Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2022-05-18T03:59:53.6082765Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2022-05-18T03:59:53.7140972Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242' 2022-05-18T03:59:53.7732773Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2022-05-18T03:59:53.7792985Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T03:59:53.7838109Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2022-05-18T03:59:54.0018045Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2022-05-18T03:59:54.1856291Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8' 2022-05-18T03:59:54.1946627Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2022-05-18T03:59:54.2272682Z Entering 'android/libs/fbjni' 2022-05-18T03:59:54.2315215Z Entering 'third_party/FP16' 2022-05-18T03:59:54.2357653Z Entering 'third_party/FXdiv' 2022-05-18T03:59:54.2399305Z Entering 'third_party/NNPACK' 2022-05-18T03:59:54.2441683Z Entering 'third_party/QNNPACK' 2022-05-18T03:59:54.2483093Z Entering 'third_party/XNNPACK' 2022-05-18T03:59:54.2536717Z Entering 'third_party/benchmark' 2022-05-18T03:59:54.2578254Z Entering 'third_party/cpuinfo' 2022-05-18T03:59:54.2621507Z Entering 'third_party/cub' 2022-05-18T03:59:54.2666399Z Entering 'third_party/cudnn_frontend' 2022-05-18T03:59:54.2714103Z Entering 'third_party/eigen' 2022-05-18T03:59:54.2758963Z Entering 'third_party/fbgemm' 2022-05-18T03:59:54.2800897Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T03:59:54.2844269Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T03:59:54.2886940Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T03:59:54.2929033Z Entering 'third_party/flatbuffers' 2022-05-18T03:59:54.2972816Z Entering 'third_party/fmt' 2022-05-18T03:59:54.3013847Z Entering 'third_party/foxi' 2022-05-18T03:59:54.3054790Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T03:59:54.3096806Z Entering 'third_party/gloo' 2022-05-18T03:59:54.3138913Z Entering 'third_party/googletest' 2022-05-18T03:59:54.3180469Z Entering 'third_party/ideep' 2022-05-18T03:59:54.3221344Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T03:59:54.3264664Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T03:59:54.3313122Z Entering 'third_party/ios-cmake' 2022-05-18T03:59:54.3354373Z Entering 'third_party/kineto' 2022-05-18T03:59:54.3396249Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T03:59:54.3438114Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T03:59:54.3480109Z Entering 'third_party/nccl/nccl' 2022-05-18T03:59:54.3522157Z Entering 'third_party/neon2sse' 2022-05-18T03:59:54.3563397Z Entering 'third_party/onnx' 2022-05-18T03:59:54.3618318Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T03:59:54.3659733Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T03:59:54.3703600Z Entering 'third_party/onnx-tensorrt' 2022-05-18T03:59:54.3745062Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T03:59:54.3792367Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T03:59:54.3833542Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T03:59:54.3875121Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T03:59:54.3922557Z Entering 'third_party/pocketfft' 2022-05-18T03:59:54.3963712Z Entering 'third_party/protobuf' 2022-05-18T03:59:54.4009041Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T03:59:54.4050267Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T03:59:54.4093615Z Entering 'third_party/psimd' 2022-05-18T03:59:54.4135177Z Entering 'third_party/pthreadpool' 2022-05-18T03:59:54.4176843Z Entering 'third_party/pybind11' 2022-05-18T03:59:54.4218581Z Entering 'third_party/python-enum' 2022-05-18T03:59:54.4259715Z Entering 'third_party/python-peachpy' 2022-05-18T03:59:54.4300648Z Entering 'third_party/python-six' 2022-05-18T03:59:54.4341925Z Entering 'third_party/sleef' 2022-05-18T03:59:54.4383780Z Entering 'third_party/tbb' 2022-05-18T03:59:54.4428316Z Entering 'third_party/tensorpipe' 2022-05-18T03:59:54.4471257Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T03:59:54.4513120Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T03:59:54.4553557Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T03:59:54.4595147Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T03:59:54.4636572Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T03:59:54.4682088Z Entering 'third_party/zstd' 2022-05-18T03:59:54.4734042Z ##[endgroup] 2022-05-18T03:59:54.4737415Z ##[group]Persisting credentials for submodules 2022-05-18T03:59:54.4745390Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : 2022-05-18T03:59:54.5062251Z Entering 'android/libs/fbjni' 2022-05-18T03:59:54.5103161Z Entering 'third_party/FP16' 2022-05-18T03:59:54.5145613Z Entering 'third_party/FXdiv' 2022-05-18T03:59:54.5185524Z Entering 'third_party/NNPACK' 2022-05-18T03:59:54.5225860Z Entering 'third_party/QNNPACK' 2022-05-18T03:59:54.5266689Z Entering 'third_party/XNNPACK' 2022-05-18T03:59:54.5318218Z Entering 'third_party/benchmark' 2022-05-18T03:59:54.5358794Z Entering 'third_party/cpuinfo' 2022-05-18T03:59:54.5400226Z Entering 'third_party/cub' 2022-05-18T03:59:54.5441121Z Entering 'third_party/cudnn_frontend' 2022-05-18T03:59:54.5488780Z Entering 'third_party/eigen' 2022-05-18T03:59:54.5549196Z Entering 'third_party/fbgemm' 2022-05-18T03:59:54.5572781Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T03:59:54.5613867Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T03:59:54.5655055Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T03:59:54.5697492Z Entering 'third_party/flatbuffers' 2022-05-18T03:59:54.5741224Z Entering 'third_party/fmt' 2022-05-18T03:59:54.5782503Z Entering 'third_party/foxi' 2022-05-18T03:59:54.5822819Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T03:59:54.5865371Z Entering 'third_party/gloo' 2022-05-18T03:59:54.5905461Z Entering 'third_party/googletest' 2022-05-18T03:59:54.5947506Z Entering 'third_party/ideep' 2022-05-18T03:59:54.5986853Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T03:59:54.6029186Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T03:59:54.6077546Z Entering 'third_party/ios-cmake' 2022-05-18T03:59:54.6117769Z Entering 'third_party/kineto' 2022-05-18T03:59:54.6157578Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T03:59:54.6198769Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T03:59:54.6240973Z Entering 'third_party/nccl/nccl' 2022-05-18T03:59:54.6283036Z Entering 'third_party/neon2sse' 2022-05-18T03:59:54.6323748Z Entering 'third_party/onnx' 2022-05-18T03:59:54.6375872Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T03:59:54.6417115Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T03:59:54.6460997Z Entering 'third_party/onnx-tensorrt' 2022-05-18T03:59:54.6501596Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T03:59:54.6547305Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T03:59:54.6588257Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T03:59:54.6628822Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T03:59:54.6675458Z Entering 'third_party/pocketfft' 2022-05-18T03:59:54.6716015Z Entering 'third_party/protobuf' 2022-05-18T03:59:54.6760167Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T03:59:54.6800964Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T03:59:54.6843742Z Entering 'third_party/psimd' 2022-05-18T03:59:54.6884447Z Entering 'third_party/pthreadpool' 2022-05-18T03:59:54.6925774Z Entering 'third_party/pybind11' 2022-05-18T03:59:54.6966595Z Entering 'third_party/python-enum' 2022-05-18T03:59:54.7007308Z Entering 'third_party/python-peachpy' 2022-05-18T03:59:54.7047752Z Entering 'third_party/python-six' 2022-05-18T03:59:54.7087999Z Entering 'third_party/sleef' 2022-05-18T03:59:54.7128299Z Entering 'third_party/tbb' 2022-05-18T03:59:54.7171000Z Entering 'third_party/tensorpipe' 2022-05-18T03:59:54.7212393Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T03:59:54.7253867Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T03:59:54.7293609Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T03:59:54.7334735Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T03:59:54.7374973Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T03:59:54.7418310Z Entering 'third_party/zstd' 2022-05-18T03:59:54.7474796Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url 2022-05-18T03:59:54.7791914Z Entering 'android/libs/fbjni' 2022-05-18T03:59:54.7830408Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2022-05-18T03:59:54.7846690Z Entering 'third_party/FP16' 2022-05-18T03:59:54.7886272Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2022-05-18T03:59:54.7902368Z Entering 'third_party/FXdiv' 2022-05-18T03:59:54.7940235Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2022-05-18T03:59:54.7957027Z Entering 'third_party/NNPACK' 2022-05-18T03:59:54.7995675Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2022-05-18T03:59:54.8012279Z Entering 'third_party/QNNPACK' 2022-05-18T03:59:54.8051402Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config remote.origin.url 2022-05-18T03:59:54.8069001Z Entering 'third_party/XNNPACK' 2022-05-18T03:59:54.8107735Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2022-05-18T03:59:54.8135556Z Entering 'third_party/benchmark' 2022-05-18T03:59:54.8174708Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2022-05-18T03:59:54.8191667Z Entering 'third_party/cpuinfo' 2022-05-18T03:59:54.8229649Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2022-05-18T03:59:54.8247069Z Entering 'third_party/cub' 2022-05-18T03:59:54.8285317Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config remote.origin.url 2022-05-18T03:59:54.8302792Z Entering 'third_party/cudnn_frontend' 2022-05-18T03:59:54.8341203Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2022-05-18T03:59:54.8363975Z Entering 'third_party/eigen' 2022-05-18T03:59:54.8401438Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config remote.origin.url 2022-05-18T03:59:54.8420290Z Entering 'third_party/fbgemm' 2022-05-18T03:59:54.8458347Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2022-05-18T03:59:54.8475239Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T03:59:54.8513256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config remote.origin.url 2022-05-18T03:59:54.8529813Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T03:59:54.8567906Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config remote.origin.url 2022-05-18T03:59:54.8585530Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T03:59:54.8622727Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config remote.origin.url 2022-05-18T03:59:54.8641520Z Entering 'third_party/flatbuffers' 2022-05-18T03:59:54.8680420Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2022-05-18T03:59:54.8699364Z Entering 'third_party/fmt' 2022-05-18T03:59:54.8737034Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2022-05-18T03:59:54.8754311Z Entering 'third_party/foxi' 2022-05-18T03:59:54.8791705Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config remote.origin.url 2022-05-18T03:59:54.8807957Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T03:59:54.8846679Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2022-05-18T03:59:54.8863604Z Entering 'third_party/gloo' 2022-05-18T03:59:54.8900982Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2022-05-18T03:59:54.8918337Z Entering 'third_party/googletest' 2022-05-18T03:59:54.8956172Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2022-05-18T03:59:54.8972768Z Entering 'third_party/ideep' 2022-05-18T03:59:54.9010729Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2022-05-18T03:59:54.9027081Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T03:59:54.9064007Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2022-05-18T03:59:54.9082974Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T03:59:54.9121135Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config remote.origin.url 2022-05-18T03:59:54.9144243Z Entering 'third_party/ios-cmake' 2022-05-18T03:59:54.9182134Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config remote.origin.url 2022-05-18T03:59:54.9198884Z Entering 'third_party/kineto' 2022-05-18T03:59:54.9237098Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2022-05-18T03:59:54.9253530Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T03:59:54.9292415Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2022-05-18T03:59:54.9309866Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T03:59:54.9348491Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2022-05-18T03:59:54.9366866Z Entering 'third_party/nccl/nccl' 2022-05-18T03:59:54.9405063Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config remote.origin.url 2022-05-18T03:59:54.9421711Z Entering 'third_party/neon2sse' 2022-05-18T03:59:54.9459099Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config remote.origin.url 2022-05-18T03:59:54.9476208Z Entering 'third_party/onnx' 2022-05-18T03:59:54.9514147Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2022-05-18T03:59:54.9542163Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T03:59:54.9580160Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2022-05-18T03:59:54.9597352Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T03:59:54.9635381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2022-05-18T03:59:54.9654427Z Entering 'third_party/onnx-tensorrt' 2022-05-18T03:59:54.9692330Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config remote.origin.url 2022-05-18T03:59:54.9709701Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T03:59:54.9747842Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config remote.origin.url 2022-05-18T03:59:54.9769465Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T03:59:54.9807982Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2022-05-18T03:59:54.9825611Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T03:59:54.9863686Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2022-05-18T03:59:54.9881421Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T03:59:54.9920446Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2022-05-18T03:59:54.9941855Z Entering 'third_party/pocketfft' 2022-05-18T03:59:54.9979974Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2022-05-18T03:59:54.9996930Z Entering 'third_party/protobuf' 2022-05-18T03:59:55.0035103Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2022-05-18T03:59:55.0055102Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T03:59:55.0093240Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2022-05-18T03:59:55.0110315Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T03:59:55.0148993Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2022-05-18T03:59:55.0167632Z Entering 'third_party/psimd' 2022-05-18T03:59:55.0205739Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2022-05-18T03:59:55.0221787Z Entering 'third_party/pthreadpool' 2022-05-18T03:59:55.0259707Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2022-05-18T03:59:55.0277013Z Entering 'third_party/pybind11' 2022-05-18T03:59:55.0315200Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2022-05-18T03:59:55.0331840Z Entering 'third_party/python-enum' 2022-05-18T03:59:55.0369621Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config remote.origin.url 2022-05-18T03:59:55.0387266Z Entering 'third_party/python-peachpy' 2022-05-18T03:59:55.0425391Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2022-05-18T03:59:55.0442277Z Entering 'third_party/python-six' 2022-05-18T03:59:55.0479930Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config remote.origin.url 2022-05-18T03:59:55.0496194Z Entering 'third_party/sleef' 2022-05-18T03:59:55.0534713Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2022-05-18T03:59:55.0553038Z Entering 'third_party/tbb' 2022-05-18T03:59:55.0590544Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config remote.origin.url 2022-05-18T03:59:55.0609330Z Entering 'third_party/tensorpipe' 2022-05-18T03:59:55.0648532Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2022-05-18T03:59:55.0665797Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T03:59:55.0703269Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2022-05-18T03:59:55.0721866Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T03:59:55.0760027Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2022-05-18T03:59:55.0776060Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T03:59:55.0813617Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2022-05-18T03:59:55.0831152Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T03:59:55.0869420Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2022-05-18T03:59:55.0885499Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T03:59:55.0923220Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2022-05-18T03:59:55.0942404Z Entering 'third_party/zstd' 2022-05-18T03:59:55.0980001Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config remote.origin.url 2022-05-18T03:59:55.1877932Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2022-05-18T03:59:55.2194734Z Entering 'android/libs/fbjni' 2022-05-18T03:59:55.2236484Z Entering 'third_party/FP16' 2022-05-18T03:59:55.2277926Z Entering 'third_party/FXdiv' 2022-05-18T03:59:55.2319312Z Entering 'third_party/NNPACK' 2022-05-18T03:59:55.2361495Z Entering 'third_party/QNNPACK' 2022-05-18T03:59:55.2402388Z Entering 'third_party/XNNPACK' 2022-05-18T03:59:55.2455372Z Entering 'third_party/benchmark' 2022-05-18T03:59:55.2496942Z Entering 'third_party/cpuinfo' 2022-05-18T03:59:55.2539760Z Entering 'third_party/cub' 2022-05-18T03:59:55.2582385Z Entering 'third_party/cudnn_frontend' 2022-05-18T03:59:55.2630861Z Entering 'third_party/eigen' 2022-05-18T03:59:55.2675200Z Entering 'third_party/fbgemm' 2022-05-18T03:59:55.2716545Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T03:59:55.2757985Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T03:59:55.2799458Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T03:59:55.2842540Z Entering 'third_party/flatbuffers' 2022-05-18T03:59:55.2885766Z Entering 'third_party/fmt' 2022-05-18T03:59:55.2927308Z Entering 'third_party/foxi' 2022-05-18T03:59:55.2969693Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T03:59:55.3011154Z Entering 'third_party/gloo' 2022-05-18T03:59:55.3053391Z Entering 'third_party/googletest' 2022-05-18T03:59:55.3095232Z Entering 'third_party/ideep' 2022-05-18T03:59:55.3136431Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T03:59:55.3179421Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T03:59:55.3227973Z Entering 'third_party/ios-cmake' 2022-05-18T03:59:55.3270186Z Entering 'third_party/kineto' 2022-05-18T03:59:55.3311739Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T03:59:55.3353339Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T03:59:55.3395346Z Entering 'third_party/nccl/nccl' 2022-05-18T03:59:55.3438044Z Entering 'third_party/neon2sse' 2022-05-18T03:59:55.3478923Z Entering 'third_party/onnx' 2022-05-18T03:59:55.3533582Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T03:59:55.3575703Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T03:59:55.3619188Z Entering 'third_party/onnx-tensorrt' 2022-05-18T03:59:55.3660015Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T03:59:55.3706108Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T03:59:55.3748395Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T03:59:55.3790339Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T03:59:55.3836007Z Entering 'third_party/pocketfft' 2022-05-18T03:59:55.3876702Z Entering 'third_party/protobuf' 2022-05-18T03:59:55.3922374Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T03:59:55.3963155Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T03:59:55.4005802Z Entering 'third_party/psimd' 2022-05-18T03:59:55.4047865Z Entering 'third_party/pthreadpool' 2022-05-18T03:59:55.4090836Z Entering 'third_party/pybind11' 2022-05-18T03:59:55.4132909Z Entering 'third_party/python-enum' 2022-05-18T03:59:55.4174487Z Entering 'third_party/python-peachpy' 2022-05-18T03:59:55.4215701Z Entering 'third_party/python-six' 2022-05-18T03:59:55.4257097Z Entering 'third_party/sleef' 2022-05-18T03:59:55.4299901Z Entering 'third_party/tbb' 2022-05-18T03:59:55.4344130Z Entering 'third_party/tensorpipe' 2022-05-18T03:59:55.4385872Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T03:59:55.4427159Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T03:59:55.4469407Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T03:59:55.4511506Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T03:59:55.4551540Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T03:59:55.4595887Z Entering 'third_party/zstd' 2022-05-18T03:59:55.4650223Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2022-05-18T03:59:55.4970066Z Entering 'android/libs/fbjni' 2022-05-18T03:59:55.5011147Z Entering 'third_party/FP16' 2022-05-18T03:59:55.5054354Z Entering 'third_party/FXdiv' 2022-05-18T03:59:55.5096913Z Entering 'third_party/NNPACK' 2022-05-18T03:59:55.5138876Z Entering 'third_party/QNNPACK' 2022-05-18T03:59:55.5182652Z Entering 'third_party/XNNPACK' 2022-05-18T03:59:55.5235854Z Entering 'third_party/benchmark' 2022-05-18T03:59:55.5278104Z Entering 'third_party/cpuinfo' 2022-05-18T03:59:55.5320704Z Entering 'third_party/cub' 2022-05-18T03:59:55.5363086Z Entering 'third_party/cudnn_frontend' 2022-05-18T03:59:55.5411106Z Entering 'third_party/eigen' 2022-05-18T03:59:55.5455673Z Entering 'third_party/fbgemm' 2022-05-18T03:59:55.5498112Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T03:59:55.5539191Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T03:59:55.5581623Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T03:59:55.5624039Z Entering 'third_party/flatbuffers' 2022-05-18T03:59:55.5668684Z Entering 'third_party/fmt' 2022-05-18T03:59:55.5710718Z Entering 'third_party/foxi' 2022-05-18T03:59:55.5752542Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T03:59:55.5795080Z Entering 'third_party/gloo' 2022-05-18T03:59:55.5837011Z Entering 'third_party/googletest' 2022-05-18T03:59:55.5879464Z Entering 'third_party/ideep' 2022-05-18T03:59:55.5921120Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T03:59:55.5964455Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T03:59:55.6011938Z Entering 'third_party/ios-cmake' 2022-05-18T03:59:55.6054799Z Entering 'third_party/kineto' 2022-05-18T03:59:55.6096469Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T03:59:55.6138196Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T03:59:55.6181446Z Entering 'third_party/nccl/nccl' 2022-05-18T03:59:55.6223391Z Entering 'third_party/neon2sse' 2022-05-18T03:59:55.6265131Z Entering 'third_party/onnx' 2022-05-18T03:59:55.6318921Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T03:59:55.6361595Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T03:59:55.6405274Z Entering 'third_party/onnx-tensorrt' 2022-05-18T03:59:55.6447681Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T03:59:55.6494499Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T03:59:55.6535748Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T03:59:55.6577119Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T03:59:55.6623101Z Entering 'third_party/pocketfft' 2022-05-18T03:59:55.6666580Z Entering 'third_party/protobuf' 2022-05-18T03:59:55.6711928Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T03:59:55.6754857Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T03:59:55.6797734Z Entering 'third_party/psimd' 2022-05-18T03:59:55.6839234Z Entering 'third_party/pthreadpool' 2022-05-18T03:59:55.6881660Z Entering 'third_party/pybind11' 2022-05-18T03:59:55.6923999Z Entering 'third_party/python-enum' 2022-05-18T03:59:55.6967001Z Entering 'third_party/python-peachpy' 2022-05-18T03:59:55.7008872Z Entering 'third_party/python-six' 2022-05-18T03:59:55.7051141Z Entering 'third_party/sleef' 2022-05-18T03:59:55.7093244Z Entering 'third_party/tbb' 2022-05-18T03:59:55.7137359Z Entering 'third_party/tensorpipe' 2022-05-18T03:59:55.7181007Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T03:59:55.7222441Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T03:59:55.7262901Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T03:59:55.7304818Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T03:59:55.7347085Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T03:59:55.7390943Z Entering 'third_party/zstd' 2022-05-18T03:59:55.7442761Z ##[endgroup] 2022-05-18T03:59:55.7488510Z [command]/usr/bin/git log -1 --format='%H' 2022-05-18T03:59:55.7517658Z '3b2375291aab7b48442f2e6fb1ef66cebc761e24' 2022-05-18T03:59:55.7666385Z Prepare all required actions 2022-05-18T03:59:55.7696239Z ##[group]Run ./.github/actions/setup-linux 2022-05-18T03:59:55.7696521Z env: 2022-05-18T03:59:55.7696726Z IN_CI: 1 2022-05-18T03:59:55.7696953Z IS_GHA: 1 2022-05-18T03:59:55.7697204Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:55.7697447Z ##[endgroup] 2022-05-18T03:59:55.7715032Z ##[group]Run set -euo pipefail 2022-05-18T03:59:55.7715355Z set -euo pipefail 2022-05-18T03:59:55.7715628Z function get_ec2_metadata() { 2022-05-18T03:59:55.7715965Z  # Pulled from instance metadata endpoint for EC2 2022-05-18T03:59:55.7716442Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2022-05-18T03:59:55.7716836Z  category=$1 2022-05-18T03:59:55.7717172Z  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2022-05-18T03:59:55.7717481Z } 2022-05-18T03:59:55.7717790Z echo "ami-id: $(get_ec2_metadata ami-id)" 2022-05-18T03:59:55.7718143Z echo "instance-id: $(get_ec2_metadata instance-id)" 2022-05-18T03:59:55.7718500Z echo "instance-type: $(get_ec2_metadata instance-type)" 2022-05-18T03:59:55.7718841Z echo "system info $(uname -a)" 2022-05-18T03:59:55.7732364Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T03:59:55.7732642Z env: 2022-05-18T03:59:55.7732862Z IN_CI: 1 2022-05-18T03:59:55.7733084Z IS_GHA: 1 2022-05-18T03:59:55.7733314Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:55.7733573Z ##[endgroup] 2022-05-18T03:59:55.7838951Z ami-id: ami-096198a0bccc6bad4 2022-05-18T03:59:55.7901789Z instance-id: i-023c3009b9c09a97d 2022-05-18T03:59:55.7963706Z instance-type: g3.8xlarge 2022-05-18T03:59:55.7971676Z system info Linux ip-10-0-4-191.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux 2022-05-18T03:59:55.7989995Z ##[group]Run if systemctl is-active --quiet docker; then 2022-05-18T03:59:55.7990400Z if systemctl is-active --quiet docker; then 2022-05-18T03:59:55.7990723Z  echo "Docker daemon is running..."; 2022-05-18T03:59:55.7991004Z else 2022-05-18T03:59:55.7991324Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2022-05-18T03:59:55.7991620Z fi 2022-05-18T03:59:55.8004028Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T03:59:55.8004328Z env: 2022-05-18T03:59:55.8004530Z IN_CI: 1 2022-05-18T03:59:55.8004755Z IS_GHA: 1 2022-05-18T03:59:55.8005003Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:55.8005243Z ##[endgroup] 2022-05-18T03:59:55.8054134Z Docker daemon is running... 2022-05-18T03:59:55.8072543Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2022-05-18T03:59:55.8073017Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2022-05-18T03:59:55.8073394Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-05-18T03:59:55.8073891Z retry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2022-05-18T03:59:55.8074359Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2022-05-18T03:59:55.8086328Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T03:59:55.8086609Z env: 2022-05-18T03:59:55.8086834Z IN_CI: 1 2022-05-18T03:59:55.8087060Z IS_GHA: 1 2022-05-18T03:59:55.8087294Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:55.8087570Z AWS_RETRY_MODE: standard 2022-05-18T03:59:55.8087832Z AWS_MAX_ATTEMPTS: 5 2022-05-18T03:59:55.8088087Z AWS_DEFAULT_REGION: us-east-1 2022-05-18T03:59:55.8088361Z ##[endgroup] 2022-05-18T03:59:56.7561686Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2022-05-18T03:59:56.7562174Z Configure a credential helper to remove this warning. See 2022-05-18T03:59:56.7563022Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2022-05-18T03:59:56.7563362Z 2022-05-18T03:59:56.7563985Z Login Succeeded 2022-05-18T03:59:56.7602831Z ##[group]Run env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" 2022-05-18T03:59:56.7603217Z env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" 2022-05-18T03:59:56.7616676Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T03:59:56.7616979Z env: 2022-05-18T03:59:56.7617182Z IN_CI: 1 2022-05-18T03:59:56.7617407Z IS_GHA: 1 2022-05-18T03:59:56.7617657Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:56.7617900Z ##[endgroup] 2022-05-18T03:59:56.7680295Z Prepare all required actions 2022-05-18T03:59:56.7680637Z Getting action download info 2022-05-18T03:59:56.8989859Z Download action repository 'seemethere/add-github-ssh-key@v1' (SHA:1ecffedb1e192a50aa67dba2f0e048e5d3bfa144) 2022-05-18T03:59:57.0188500Z ##[group]Run ./.github/actions/setup-ssh 2022-05-18T03:59:57.0188760Z with: 2022-05-18T03:59:57.0189180Z github-secret: *** 2022-05-18T03:59:57.0189425Z env: 2022-05-18T03:59:57.0189646Z IN_CI: 1 2022-05-18T03:59:57.0189854Z IS_GHA: 1 2022-05-18T03:59:57.0190103Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:57.0190360Z ##[endgroup] 2022-05-18T03:59:57.0214227Z ##[group]Run seemethere/add-github-ssh-key@v1 2022-05-18T03:59:57.0214519Z with: 2022-05-18T03:59:57.0214889Z GITHUB_TOKEN: *** 2022-05-18T03:59:57.0215145Z activate-with-label: false 2022-05-18T03:59:57.0215413Z label: with-ssh 2022-05-18T03:59:57.0215680Z remove-existing-keys: true 2022-05-18T03:59:57.0215915Z env: 2022-05-18T03:59:57.0216127Z IN_CI: 1 2022-05-18T03:59:57.0216390Z IS_GHA: 1 2022-05-18T03:59:57.0216622Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:57.0216887Z ##[endgroup] 2022-05-18T03:59:57.0912506Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2022-05-18T03:59:57.0960410Z Prepare all required actions 2022-05-18T03:59:57.0980029Z ##[group]Run ./.github/actions/pull-docker-image 2022-05-18T03:59:57.0980320Z with: 2022-05-18T03:59:57.0980794Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T03:59:57.0981268Z env: 2022-05-18T03:59:57.0981484Z IN_CI: 1 2022-05-18T03:59:57.0981691Z IS_GHA: 1 2022-05-18T03:59:57.0981939Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:57.0982197Z ##[endgroup] 2022-05-18T03:59:57.0997224Z ##[group]Run retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-05-18T03:59:57.0997592Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-05-18T03:59:57.0997930Z retry docker pull "${DOCKER_IMAGE}" 2022-05-18T03:59:57.1010980Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T03:59:57.1011282Z env: 2022-05-18T03:59:57.1011507Z IN_CI: 1 2022-05-18T03:59:57.1011753Z IS_GHA: 1 2022-05-18T03:59:57.1011990Z GIT_DEFAULT_BRANCH: master 2022-05-18T03:59:57.1012518Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T03:59:57.1013010Z ##[endgroup] 2022-05-18T03:59:57.3258622Z 6deab82db6a72ca54cd3e3322ee4f13864536734: Pulling from pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 2022-05-18T03:59:57.3259436Z 11323ed2c653: Pulling fs layer 2022-05-18T03:59:57.3259937Z 9b0c32b3202c: Pulling fs layer 2022-05-18T03:59:57.3260372Z 55d4aa3df964: Pulling fs layer 2022-05-18T03:59:57.3260659Z ced0e45f533f: Pulling fs layer 2022-05-18T03:59:57.3260916Z a6d5f855f26c: Pulling fs layer 2022-05-18T03:59:57.3261187Z 532188ad0a5d: Pulling fs layer 2022-05-18T03:59:57.3261455Z 53b0132b34a2: Pulling fs layer 2022-05-18T03:59:57.3261720Z d63f711e9949: Pulling fs layer 2022-05-18T03:59:57.3261992Z 776e7a7e28b2: Pulling fs layer 2022-05-18T03:59:57.3262260Z 69004237646f: Pulling fs layer 2022-05-18T03:59:57.3262506Z a0a6f96a62d8: Pulling fs layer 2022-05-18T03:59:57.3262772Z 7918ac79e586: Pulling fs layer 2022-05-18T03:59:57.3263025Z ced0e45f533f: Waiting 2022-05-18T03:59:57.3263282Z 517f3f32e512: Pulling fs layer 2022-05-18T03:59:57.3263517Z 53b0132b34a2: Waiting 2022-05-18T03:59:57.3264129Z 7c88fb71bf11: Pulling fs layer 2022-05-18T03:59:57.3264400Z 7b920d7a1988: Pulling fs layer 2022-05-18T03:59:57.3264650Z 0ba8a6800faf: Pulling fs layer 2022-05-18T03:59:57.3264903Z d63f711e9949: Waiting 2022-05-18T03:59:57.3265159Z 6d58a87851d7: Pulling fs layer 2022-05-18T03:59:57.3265437Z a6d5f855f26c: Waiting 2022-05-18T03:59:57.3265692Z b06b299e7454: Pulling fs layer 2022-05-18T03:59:57.3265955Z b046a45d4ca8: Pulling fs layer 2022-05-18T03:59:57.3266247Z 776e7a7e28b2: Waiting 2022-05-18T03:59:57.3266735Z acf3886a01ad: Pulling fs layer 2022-05-18T03:59:57.3267239Z 166228572fc8: Pulling fs layer 2022-05-18T03:59:57.3267679Z 532188ad0a5d: Waiting 2022-05-18T03:59:57.3268180Z 6d680b004bdb: Pulling fs layer 2022-05-18T03:59:57.3268715Z 4d9d54d04be5: Pulling fs layer 2022-05-18T03:59:57.3269238Z 55e19101ee96: Pulling fs layer 2022-05-18T03:59:57.3269699Z d57378452c6c: Pulling fs layer 2022-05-18T03:59:57.3270109Z 4097195e70a4: Pulling fs layer 2022-05-18T03:59:57.3270376Z e90775d597ae: Pulling fs layer 2022-05-18T03:59:57.3270618Z 0ba8a6800faf: Waiting 2022-05-18T03:59:57.3270872Z 342cb5b8793f: Pulling fs layer 2022-05-18T03:59:57.3271124Z acf3886a01ad: Waiting 2022-05-18T03:59:57.3271457Z ec9f4694245d: Pulling fs layer 2022-05-18T03:59:57.3271951Z 5ff41a564c23: Pulling fs layer 2022-05-18T03:59:57.3272380Z 6d58a87851d7: Waiting 2022-05-18T03:59:57.3272799Z a0a6f96a62d8: Waiting 2022-05-18T03:59:57.3273240Z b046a45d4ca8: Waiting 2022-05-18T03:59:57.3273676Z 7b920d7a1988: Waiting 2022-05-18T03:59:57.3274096Z 69004237646f: Waiting 2022-05-18T03:59:57.3274557Z b06b299e7454: Waiting 2022-05-18T03:59:57.3274989Z 7c88fb71bf11: Waiting 2022-05-18T03:59:57.3275619Z 7918ac79e586: Waiting 2022-05-18T03:59:57.3276071Z 166228572fc8: Waiting 2022-05-18T03:59:57.3276324Z 517f3f32e512: Waiting 2022-05-18T03:59:57.3276544Z 342cb5b8793f: Waiting 2022-05-18T03:59:57.3276781Z 4d9d54d04be5: Waiting 2022-05-18T03:59:57.3277015Z ec9f4694245d: Waiting 2022-05-18T03:59:57.3277255Z 5e9e1c5c2b02: Pulling fs layer 2022-05-18T03:59:57.3277529Z 85cae8860e8b: Pulling fs layer 2022-05-18T03:59:57.3277803Z 7bd074c80c3f: Pulling fs layer 2022-05-18T03:59:57.3278039Z 4097195e70a4: Waiting 2022-05-18T03:59:57.3278276Z 5e9e1c5c2b02: Waiting 2022-05-18T03:59:57.3278528Z 7ebce38575d6: Pulling fs layer 2022-05-18T03:59:57.3278783Z 3dcf0fc78ba8: Pulling fs layer 2022-05-18T03:59:57.3279058Z de93ffc12e40: Pulling fs layer 2022-05-18T03:59:57.3279312Z 6d680b004bdb: Waiting 2022-05-18T03:59:57.3279535Z 3dcf0fc78ba8: Waiting 2022-05-18T03:59:57.3279799Z 55e19101ee96: Waiting 2022-05-18T03:59:57.3280164Z fd0f553736b3: Pulling fs layer 2022-05-18T03:59:57.3280430Z 6b52bc4fc524: Pulling fs layer 2022-05-18T03:59:57.3280671Z 7bd074c80c3f: Waiting 2022-05-18T03:59:57.3280908Z e90775d597ae: Waiting 2022-05-18T03:59:57.3281162Z f709baccd3f5: Pulling fs layer 2022-05-18T03:59:57.3281418Z 25dff8b9a054: Pulling fs layer 2022-05-18T03:59:57.3281689Z bcd88fe424d2: Pulling fs layer 2022-05-18T03:59:57.3281957Z 8710652e57c7: Pulling fs layer 2022-05-18T03:59:57.3282202Z 050758b5b900: Pulling fs layer 2022-05-18T03:59:57.3282468Z e104e8ddd08b: Pulling fs layer 2022-05-18T03:59:57.3282735Z b0c972c96382: Pulling fs layer 2022-05-18T03:59:57.3283013Z 053d59c76970: Pulling fs layer 2022-05-18T03:59:57.3283509Z 30dcacd2ffe2: Pulling fs layer 2022-05-18T03:59:57.3284004Z 1c1fd12e267d: Pulling fs layer 2022-05-18T03:59:57.3284249Z de93ffc12e40: Waiting 2022-05-18T03:59:57.3284489Z 050758b5b900: Waiting 2022-05-18T03:59:57.3284722Z fd0f553736b3: Waiting 2022-05-18T03:59:57.3284942Z e104e8ddd08b: Waiting 2022-05-18T03:59:57.3285189Z bcd88fe424d2: Waiting 2022-05-18T03:59:57.3285424Z b0c972c96382: Waiting 2022-05-18T03:59:57.3285647Z f709baccd3f5: Waiting 2022-05-18T03:59:57.3285885Z 25dff8b9a054: Waiting 2022-05-18T03:59:57.3286118Z 053d59c76970: Waiting 2022-05-18T03:59:57.3286334Z 1c1fd12e267d: Waiting 2022-05-18T03:59:57.3286574Z 6b52bc4fc524: Waiting 2022-05-18T03:59:57.3286813Z 30dcacd2ffe2: Waiting 2022-05-18T03:59:57.4710989Z 9b0c32b3202c: Verifying Checksum 2022-05-18T03:59:57.4711554Z 9b0c32b3202c: Download complete 2022-05-18T03:59:57.5215138Z 55d4aa3df964: Verifying Checksum 2022-05-18T03:59:57.5215739Z 55d4aa3df964: Download complete 2022-05-18T03:59:57.6075443Z a6d5f855f26c: Download complete 2022-05-18T03:59:57.6452466Z 11323ed2c653: Verifying Checksum 2022-05-18T03:59:57.6453034Z 11323ed2c653: Download complete 2022-05-18T03:59:57.7091337Z 53b0132b34a2: Verifying Checksum 2022-05-18T03:59:57.7091825Z 53b0132b34a2: Download complete 2022-05-18T03:59:57.8232870Z ced0e45f533f: Verifying Checksum 2022-05-18T03:59:57.8233507Z ced0e45f533f: Download complete 2022-05-18T03:59:57.9147094Z 776e7a7e28b2: Download complete 2022-05-18T03:59:58.4774943Z 11323ed2c653: Pull complete 2022-05-18T03:59:58.7628531Z 9b0c32b3202c: Pull complete 2022-05-18T03:59:59.0695261Z 55d4aa3df964: Pull complete 2022-05-18T03:59:59.1970604Z ced0e45f533f: Pull complete 2022-05-18T03:59:59.3342203Z a6d5f855f26c: Pull complete 2022-05-18T04:00:04.4791519Z 69004237646f: Verifying Checksum 2022-05-18T04:00:04.4791914Z 69004237646f: Download complete 2022-05-18T04:00:04.5584961Z a0a6f96a62d8: Verifying Checksum 2022-05-18T04:00:04.5585286Z a0a6f96a62d8: Download complete 2022-05-18T04:00:06.4159015Z 7918ac79e586: Verifying Checksum 2022-05-18T04:00:06.4159394Z 7918ac79e586: Download complete 2022-05-18T04:00:06.4985965Z 517f3f32e512: Verifying Checksum 2022-05-18T04:00:06.4986405Z 517f3f32e512: Download complete 2022-05-18T04:00:06.5760054Z 7c88fb71bf11: Download complete 2022-05-18T04:00:06.6533032Z 7b920d7a1988: Verifying Checksum 2022-05-18T04:00:06.6533600Z 7b920d7a1988: Download complete 2022-05-18T04:00:06.7339898Z 0ba8a6800faf: Verifying Checksum 2022-05-18T04:00:06.7340618Z 0ba8a6800faf: Download complete 2022-05-18T04:00:06.8068058Z 6d58a87851d7: Verifying Checksum 2022-05-18T04:00:06.8068506Z 6d58a87851d7: Download complete 2022-05-18T04:00:06.8722704Z b06b299e7454: Verifying Checksum 2022-05-18T04:00:06.8723157Z b06b299e7454: Download complete 2022-05-18T04:00:07.1534226Z 532188ad0a5d: Verifying Checksum 2022-05-18T04:00:07.1537666Z 532188ad0a5d: Download complete 2022-05-18T04:00:07.2359329Z acf3886a01ad: Verifying Checksum 2022-05-18T04:00:07.2359748Z acf3886a01ad: Download complete 2022-05-18T04:00:07.3168985Z 166228572fc8: Download complete 2022-05-18T04:00:07.3936158Z 6d680b004bdb: Download complete 2022-05-18T04:00:07.4752999Z 4d9d54d04be5: Verifying Checksum 2022-05-18T04:00:07.5695257Z 55e19101ee96: Download complete 2022-05-18T04:00:07.6456388Z d57378452c6c: Verifying Checksum 2022-05-18T04:00:08.6126863Z 4097195e70a4: Verifying Checksum 2022-05-18T04:00:08.6127756Z 4097195e70a4: Download complete 2022-05-18T04:00:08.7010628Z e90775d597ae: Download complete 2022-05-18T04:00:08.7656352Z 342cb5b8793f: Verifying Checksum 2022-05-18T04:00:08.7656677Z 342cb5b8793f: Download complete 2022-05-18T04:00:08.8593295Z ec9f4694245d: Verifying Checksum 2022-05-18T04:00:08.8593628Z ec9f4694245d: Download complete 2022-05-18T04:00:08.9463936Z 5ff41a564c23: Verifying Checksum 2022-05-18T04:00:08.9464441Z 5ff41a564c23: Download complete 2022-05-18T04:00:09.0187417Z 5e9e1c5c2b02: Verifying Checksum 2022-05-18T04:00:09.0187727Z 5e9e1c5c2b02: Download complete 2022-05-18T04:00:10.3987080Z d63f711e9949: Verifying Checksum 2022-05-18T04:00:10.3987482Z d63f711e9949: Download complete 2022-05-18T04:00:10.4724918Z 7bd074c80c3f: Verifying Checksum 2022-05-18T04:00:10.4725271Z 7bd074c80c3f: Download complete 2022-05-18T04:00:10.5424896Z 7ebce38575d6: Verifying Checksum 2022-05-18T04:00:10.5425212Z 7ebce38575d6: Download complete 2022-05-18T04:00:10.7948366Z 3dcf0fc78ba8: Verifying Checksum 2022-05-18T04:00:10.7948697Z 3dcf0fc78ba8: Download complete 2022-05-18T04:00:10.8600160Z de93ffc12e40: Verifying Checksum 2022-05-18T04:00:10.8600505Z de93ffc12e40: Download complete 2022-05-18T04:00:10.9348043Z fd0f553736b3: Verifying Checksum 2022-05-18T04:00:10.9348631Z fd0f553736b3: Download complete 2022-05-18T04:00:11.0252688Z 6b52bc4fc524: Verifying Checksum 2022-05-18T04:00:11.0253128Z 6b52bc4fc524: Download complete 2022-05-18T04:00:11.0704983Z 85cae8860e8b: Verifying Checksum 2022-05-18T04:00:11.0705297Z 85cae8860e8b: Download complete 2022-05-18T04:00:11.1400363Z 25dff8b9a054: Verifying Checksum 2022-05-18T04:00:11.1400683Z 25dff8b9a054: Download complete 2022-05-18T04:00:11.2164581Z bcd88fe424d2: Download complete 2022-05-18T04:00:11.2923078Z 8710652e57c7: Verifying Checksum 2022-05-18T04:00:11.2923380Z 8710652e57c7: Download complete 2022-05-18T04:00:11.3672143Z 050758b5b900: Verifying Checksum 2022-05-18T04:00:11.3672772Z 050758b5b900: Download complete 2022-05-18T04:00:11.5559549Z e104e8ddd08b: Verifying Checksum 2022-05-18T04:00:11.5560198Z e104e8ddd08b: Download complete 2022-05-18T04:00:11.6259553Z b0c972c96382: Download complete 2022-05-18T04:00:12.2316012Z 053d59c76970: Verifying Checksum 2022-05-18T04:00:12.2316598Z 053d59c76970: Download complete 2022-05-18T04:00:12.2999327Z 30dcacd2ffe2: Verifying Checksum 2022-05-18T04:00:12.2999812Z 30dcacd2ffe2: Download complete 2022-05-18T04:00:12.3859951Z 1c1fd12e267d: Verifying Checksum 2022-05-18T04:00:12.3860330Z 1c1fd12e267d: Download complete 2022-05-18T04:00:14.4864920Z f709baccd3f5: Verifying Checksum 2022-05-18T04:00:14.4865263Z f709baccd3f5: Download complete 2022-05-18T04:00:16.2738028Z 532188ad0a5d: Pull complete 2022-05-18T04:00:16.3915676Z 53b0132b34a2: Pull complete 2022-05-18T04:00:27.9043549Z b046a45d4ca8: Verifying Checksum 2022-05-18T04:00:27.9043892Z b046a45d4ca8: Download complete 2022-05-18T04:00:30.1202434Z d63f711e9949: Pull complete 2022-05-18T04:00:32.1585154Z 776e7a7e28b2: Pull complete 2022-05-18T04:00:40.7833156Z 69004237646f: Pull complete 2022-05-18T04:00:42.1893621Z a0a6f96a62d8: Pull complete 2022-05-18T04:00:48.6181381Z 7918ac79e586: Pull complete 2022-05-18T04:00:50.5129347Z 517f3f32e512: Pull complete 2022-05-18T04:00:52.4604069Z 7c88fb71bf11: Pull complete 2022-05-18T04:00:54.5297514Z 7b920d7a1988: Pull complete 2022-05-18T04:00:56.7703342Z 0ba8a6800faf: Pull complete 2022-05-18T04:00:58.9002973Z 6d58a87851d7: Pull complete 2022-05-18T04:01:01.1730637Z b06b299e7454: Pull complete 2022-05-18T04:01:36.3480703Z b046a45d4ca8: Pull complete 2022-05-18T04:01:38.1759801Z acf3886a01ad: Pull complete 2022-05-18T04:01:38.3038847Z 166228572fc8: Pull complete 2022-05-18T04:01:38.4289885Z 6d680b004bdb: Pull complete 2022-05-18T04:01:38.5410483Z 4d9d54d04be5: Pull complete 2022-05-18T04:01:38.6449103Z 55e19101ee96: Pull complete 2022-05-18T04:01:38.7448170Z d57378452c6c: Pull complete 2022-05-18T04:01:41.0091572Z 4097195e70a4: Pull complete 2022-05-18T04:01:41.1165377Z e90775d597ae: Pull complete 2022-05-18T04:01:41.2285221Z 342cb5b8793f: Pull complete 2022-05-18T04:01:41.3749259Z ec9f4694245d: Pull complete 2022-05-18T04:01:41.4793456Z 5ff41a564c23: Pull complete 2022-05-18T04:01:41.5870542Z 5e9e1c5c2b02: Pull complete 2022-05-18T04:01:49.5116912Z 85cae8860e8b: Pull complete 2022-05-18T04:01:51.4187618Z 7bd074c80c3f: Pull complete 2022-05-18T04:01:53.2886262Z 7ebce38575d6: Pull complete 2022-05-18T04:01:55.7188678Z 3dcf0fc78ba8: Pull complete 2022-05-18T04:01:59.1894556Z de93ffc12e40: Pull complete 2022-05-18T04:02:02.4177342Z fd0f553736b3: Pull complete 2022-05-18T04:02:04.2104109Z 6b52bc4fc524: Pull complete 2022-05-18T04:02:11.8152384Z f709baccd3f5: Pull complete 2022-05-18T04:02:13.5849128Z 25dff8b9a054: Pull complete 2022-05-18T04:02:15.2645984Z bcd88fe424d2: Pull complete 2022-05-18T04:02:17.1101668Z 8710652e57c7: Pull complete 2022-05-18T04:02:19.5170002Z 050758b5b900: Pull complete 2022-05-18T04:02:23.3946507Z e104e8ddd08b: Pull complete 2022-05-18T04:02:23.5019442Z b0c972c96382: Pull complete 2022-05-18T04:02:25.3281623Z 053d59c76970: Pull complete 2022-05-18T04:02:25.4445192Z 30dcacd2ffe2: Pull complete 2022-05-18T04:02:25.5746593Z 1c1fd12e267d: Pull complete 2022-05-18T04:02:25.5882570Z Digest: sha256:9737b662edb86afcd12a9367db6178a57889543632c0b710c5058abe14dc048f 2022-05-18T04:02:25.5923112Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:02:25.5956625Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:02:25.6067237Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a 2022-05-18T04:02:25.6067592Z with: 2022-05-18T04:02:25.6067815Z timeout_minutes: 10 2022-05-18T04:02:25.6068077Z max_attempts: 3 2022-05-18T04:02:25.6068484Z command: set -ex bash .github/scripts/install_nvidia_utils_linux.sh echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" 2022-05-18T04:02:25.6068900Z retry_wait_seconds: 10 2022-05-18T04:02:25.6069166Z polling_interval_seconds: 1 2022-05-18T04:02:25.6069445Z warning_on_retry: true 2022-05-18T04:02:25.6069718Z continue_on_error: false 2022-05-18T04:02:25.6069951Z env: 2022-05-18T04:02:25.6070167Z IN_CI: 1 2022-05-18T04:02:25.6070397Z IS_GHA: 1 2022-05-18T04:02:25.6070634Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:02:25.6070895Z ##[endgroup] 2022-05-18T04:02:25.6507596Z 2022-05-18T04:02:25.6579629Z == Installing nvidia container toolkit for amzn2 == 2022-05-18T04:02:25.6582499Z + bash .github/scripts/install_nvidia_utils_linux.sh 2022-05-18T04:02:25.6582946Z + sudo yum install -y yum-utils 2022-05-18T04:02:26.2056637Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:02:27.5683954Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version 2022-05-18T04:02:27.5684378Z Nothing to do 2022-05-18T04:02:27.6387739Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2022-05-18T04:02:28.1798867Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:02:28.2141169Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2022-05-18T04:02:28.2142084Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo 2022-05-18T04:02:28.2142614Z repo saved to /etc/yum.repos.d/nvidia-docker.repo 2022-05-18T04:02:28.2283772Z + sudo yum install -y nvidia-docker2 2022-05-18T04:02:28.7610717Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:02:30.0024496Z Package nvidia-docker2-2.10.0-1.noarch already installed and latest version 2022-05-18T04:02:30.0025374Z Nothing to do 2022-05-18T04:02:30.0769621Z + sudo systemctl restart docker 2022-05-18T04:02:37.4342053Z == Installing nvidia driver NVIDIA-Linux-x86_64-510.60.02.run == 2022-05-18T04:02:37.4342766Z + sudo yum groupinstall -y 'Development Tools' 2022-05-18T04:02:37.9728681Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:02:38.9992533Z Maybe run: yum groups mark install (see man yum) 2022-05-18T04:02:38.9992984Z No packages in any requested group available to install or update 2022-05-18T04:02:39.0659148Z ++ uname -r 2022-05-18T04:02:39.0664841Z + sudo yum install -y 'kernel-devel-uname-r == 4.14.252-195.483.amzn2.x86_64' 2022-05-18T04:02:39.6050358Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:02:40.8622225Z Package kernel-devel-4.14.252-195.483.amzn2.x86_64 already installed and latest version 2022-05-18T04:02:40.8622648Z Nothing to do 2022-05-18T04:02:40.9330728Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-510.60.02.run 2022-05-18T04:02:44.2471360Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm 2022-05-18T04:02:45.4751137Z Verifying archive integrity... OK 2022-05-18T04:03:09.6715389Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 510.60.02.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 2022-05-18T04:03:09.8984497Z 2022-05-18T04:03:09.8985109Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver. 2022-05-18T04:03:09.8987789Z 2022-05-18T04:03:24.5476916Z 2022-05-18T04:03:24.5478176Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver. 2022-05-18T04:03:24.5478902Z 2022-05-18T04:03:33.4822585Z + sudo rm -fv /tmp/nvidia_driver 2022-05-18T04:03:33.5342067Z removed ‘/tmp/nvidia_driver’ 2022-05-18T04:03:33.5356085Z + nvidia-smi 2022-05-18T04:03:37.6685466Z Wed May 18 04:03:37 2022 2022-05-18T04:03:37.6686075Z +-----------------------------------------------------------------------------+ 2022-05-18T04:03:37.6686630Z | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 | 2022-05-18T04:03:37.6687128Z |-------------------------------+----------------------+----------------------+ 2022-05-18T04:03:37.6690008Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2022-05-18T04:03:37.6690556Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2022-05-18T04:03:37.6690964Z | | | MIG M. | 2022-05-18T04:03:37.6691273Z |===============================+======================+======================| 2022-05-18T04:03:37.6736025Z | 0 Tesla M60 Off | 00000000:00:1D.0 Off | 0 | 2022-05-18T04:03:37.6736451Z | N/A 29C P0 38W / 150W | 0MiB / 7680MiB | 0% Default | 2022-05-18T04:03:37.6736773Z | | | N/A | 2022-05-18T04:03:37.6737327Z +-------------------------------+----------------------+----------------------+ 2022-05-18T04:03:37.6786360Z | 1 Tesla M60 Off | 00000000:00:1E.0 Off | 0 | 2022-05-18T04:03:37.6786744Z | N/A 33C P0 38W / 150W | 0MiB / 7680MiB | 98% Default | 2022-05-18T04:03:37.6787072Z | | | N/A | 2022-05-18T04:03:37.6787866Z +-------------------------------+----------------------+----------------------+ 2022-05-18T04:03:37.6788216Z 2022-05-18T04:03:37.6788656Z +-----------------------------------------------------------------------------+ 2022-05-18T04:03:37.6789032Z | Processes: | 2022-05-18T04:03:37.6789366Z | GPU GI CI PID Type Process name GPU Memory | 2022-05-18T04:03:37.6789710Z | ID ID Usage | 2022-05-18T04:03:37.6790031Z |=============================================================================| 2022-05-18T04:03:37.6791789Z | No running processes found | 2022-05-18T04:03:37.6792286Z +-----------------------------------------------------------------------------+ 2022-05-18T04:03:38.1957797Z + echo 'GPU_FLAG=--gpus all' 2022-05-18T04:03:38.7232633Z Command completed after 1 attempt(s). 2022-05-18T04:03:38.7232869Z 2022-05-18T04:03:38.7296587Z Prepare all required actions 2022-05-18T04:03:38.7296976Z Getting action download info 2022-05-18T04:03:38.9788783Z Download action repository 'seemethere/download-artifact-s3@v3' (SHA:64048a097659c8ca71ceacbb3c01cee9ed6f1b05) 2022-05-18T04:03:39.2203804Z Download action repository 'actions/download-artifact@v2' (SHA:f023be2c48cc18debc3bacd34cb396e0295e2869) 2022-05-18T04:03:39.3287086Z ##[group]Run ./.github/actions/download-build-artifacts 2022-05-18T04:03:39.3287389Z with: 2022-05-18T04:03:39.3287654Z name: linux-bionic-cuda10.2-py3.9-gcc7 2022-05-18T04:03:39.3287930Z env: 2022-05-18T04:03:39.3288139Z IN_CI: 1 2022-05-18T04:03:39.3288347Z IS_GHA: 1 2022-05-18T04:03:39.3288596Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:03:39.3288858Z GPU_FLAG: --gpus all 2022-05-18T04:03:39.3289086Z ##[endgroup] 2022-05-18T04:03:39.3317365Z ##[group]Run seemethere/download-artifact-s3@v3 2022-05-18T04:03:39.3317671Z with: 2022-05-18T04:03:39.3317982Z name: linux-bionic-cuda10.2-py3.9-gcc7 2022-05-18T04:03:39.3318275Z s3-bucket: gha-artifacts 2022-05-18T04:03:39.3318536Z region: us-east-1 2022-05-18T04:03:39.3318763Z env: 2022-05-18T04:03:39.3318956Z IN_CI: 1 2022-05-18T04:03:39.3319175Z IS_GHA: 1 2022-05-18T04:03:39.3319417Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:03:39.3319665Z GPU_FLAG: --gpus all 2022-05-18T04:03:39.3319909Z ##[endgroup] 2022-05-18T04:03:39.8420283Z Found 1 objects with prefix pytorch/pytorch/2342799949/1/linux-bionic-cuda10.2-py3.9-gcc7/ 2022-05-18T04:03:39.8420891Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2022-05-18T04:03:46.7532754Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2022-05-18T04:03:46.7533085Z 2022-05-18T04:03:46.7534606Z Artifact download has finished successfully 2022-05-18T04:03:46.7670013Z ##[group]Run unzip -o artifacts.zip 2022-05-18T04:03:46.7670339Z unzip -o artifacts.zip 2022-05-18T04:03:46.7683731Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:03:46.7684012Z env: 2022-05-18T04:03:46.7684281Z IN_CI: 1 2022-05-18T04:03:46.7684508Z IS_GHA: 1 2022-05-18T04:03:46.7684736Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:03:46.7685005Z GPU_FLAG: --gpus all 2022-05-18T04:03:46.7685255Z ##[endgroup] 2022-05-18T04:03:46.7728500Z Archive: artifacts.zip 2022-05-18T04:03:46.7730637Z creating: dist/ 2022-05-18T04:03:48.5639587Z inflating: dist/torch-1.12.0a0+git3b23752-cp39-cp39-linux_x86_64.whl 2022-05-18T04:03:48.5639988Z creating: build/custom_test_artifacts/ 2022-05-18T04:03:48.5640413Z creating: build/custom_test_artifacts/custom-op-build/ 2022-05-18T04:03:48.5640881Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2022-05-18T04:03:48.5647426Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log 2022-05-18T04:03:48.5647980Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/ 2022-05-18T04:03:48.5648777Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2022-05-18T04:03:48.5649336Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/ 2022-05-18T04:03:48.5649875Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2022-05-18T04:03:48.5651865Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2022-05-18T04:03:48.5653035Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2022-05-18T04:03:48.5653604Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2022-05-18T04:03:48.5654149Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2022-05-18T04:03:48.5656649Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-05-18T04:03:48.5657925Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2022-05-18T04:03:48.5659336Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2022-05-18T04:03:48.5660091Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2022-05-18T04:03:48.5661975Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2022-05-18T04:03:48.5663192Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2022-05-18T04:03:48.5664380Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2022-05-18T04:03:48.5664965Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2022-05-18T04:03:48.5709568Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-05-18T04:03:48.5710888Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-05-18T04:03:48.5712113Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-05-18T04:03:48.5713484Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-05-18T04:03:48.5714443Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-05-18T04:03:48.5715283Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-05-18T04:03:48.5715964Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_30.cubin 2022-05-18T04:03:48.5716670Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-05-18T04:03:48.5717374Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-05-18T04:03:48.5752401Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-05-18T04:03:48.5787407Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-05-18T04:03:48.5788954Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-05-18T04:03:48.5790052Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_30.cubin 2022-05-18T04:03:48.5791121Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-05-18T04:03:48.5792314Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-05-18T04:03:48.5792948Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-05-18T04:03:48.5793900Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2022-05-18T04:03:48.5794551Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-05-18T04:03:48.5853612Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2022-05-18T04:03:48.5913246Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2022-05-18T04:03:48.5914220Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2022-05-18T04:03:48.5915267Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2022-05-18T04:03:48.5916447Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log 2022-05-18T04:03:48.5917338Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2022-05-18T04:03:48.5918331Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2022-05-18T04:03:48.5919123Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2022-05-18T04:03:48.5920265Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2022-05-18T04:03:48.5921252Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2022-05-18T04:03:48.5921985Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2022-05-18T04:03:48.5922572Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2022-05-18T04:03:48.5923155Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2022-05-18T04:03:48.5923745Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2022-05-18T04:03:48.5924330Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2022-05-18T04:03:48.5924911Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2022-05-18T04:03:48.5941203Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2022-05-18T04:03:48.6050439Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2022-05-18T04:03:48.6051413Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2022-05-18T04:03:48.6052493Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2022-05-18T04:03:48.6053462Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2022-05-18T04:03:48.6054459Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2022-05-18T04:03:48.6055558Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2022-05-18T04:03:48.6056408Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2022-05-18T04:03:48.6057005Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2022-05-18T04:03:48.6057615Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2022-05-18T04:03:48.6058225Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2022-05-18T04:03:48.6058826Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2022-05-18T04:03:48.6076147Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2022-05-18T04:03:48.6156551Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2022-05-18T04:03:48.6157593Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-05-18T04:03:48.6158427Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2022-05-18T04:03:48.6159155Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2022-05-18T04:03:48.6159881Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2022-05-18T04:03:48.6160733Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2022-05-18T04:03:48.6161268Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2022-05-18T04:03:48.6163605Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2022-05-18T04:03:48.6164621Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2022-05-18T04:03:48.6165275Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2022-05-18T04:03:48.6254243Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2022-05-18T04:03:48.6315311Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2022-05-18T04:03:48.6315799Z creating: build/custom_test_artifacts/jit-hook-build/ 2022-05-18T04:03:48.6316256Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2022-05-18T04:03:48.6322619Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log 2022-05-18T04:03:48.6323633Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/ 2022-05-18T04:03:48.6324250Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2022-05-18T04:03:48.6324812Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/ 2022-05-18T04:03:48.6325347Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2022-05-18T04:03:48.6326519Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2022-05-18T04:03:48.6328065Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2022-05-18T04:03:48.6328856Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2022-05-18T04:03:48.6329397Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2022-05-18T04:03:48.6331497Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-05-18T04:03:48.6332615Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2022-05-18T04:03:48.6334105Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2022-05-18T04:03:48.6334959Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2022-05-18T04:03:48.6336289Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2022-05-18T04:03:48.6337560Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2022-05-18T04:03:48.6338337Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2022-05-18T04:03:48.6338887Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2022-05-18T04:03:48.6383662Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-05-18T04:03:48.6385223Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-05-18T04:03:48.6386476Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-05-18T04:03:48.6387786Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-05-18T04:03:48.6388845Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-05-18T04:03:48.6389570Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-05-18T04:03:48.6390258Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_30.cubin 2022-05-18T04:03:48.6390949Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-05-18T04:03:48.6391720Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-05-18T04:03:48.6426524Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-05-18T04:03:48.6461130Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-05-18T04:03:48.6462527Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-05-18T04:03:48.6463841Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_30.cubin 2022-05-18T04:03:48.6464933Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-05-18T04:03:48.6465993Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-05-18T04:03:48.6466638Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-05-18T04:03:48.6467590Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2022-05-18T04:03:48.6468247Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-05-18T04:03:48.6527314Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2022-05-18T04:03:48.6586839Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2022-05-18T04:03:48.6587720Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2022-05-18T04:03:48.6588689Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2022-05-18T04:03:48.6589674Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log 2022-05-18T04:03:48.6590677Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2022-05-18T04:03:48.6591659Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2022-05-18T04:03:48.6592510Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2022-05-18T04:03:48.6593548Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2022-05-18T04:03:48.6594621Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2022-05-18T04:03:48.6595364Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2022-05-18T04:03:48.6595960Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2022-05-18T04:03:48.6596683Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2022-05-18T04:03:48.6597415Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2022-05-18T04:03:48.6598007Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2022-05-18T04:03:48.6598588Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2022-05-18T04:03:48.6614724Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2022-05-18T04:03:48.6677817Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2022-05-18T04:03:48.6678893Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-05-18T04:03:48.6679828Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2022-05-18T04:03:48.6680400Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2022-05-18T04:03:48.6681411Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2022-05-18T04:03:48.6682058Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2022-05-18T04:03:48.6682608Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2022-05-18T04:03:48.6684328Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2022-05-18T04:03:48.6685117Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2022-05-18T04:03:48.6685947Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2022-05-18T04:03:48.6734436Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2022-05-18T04:03:48.6734939Z creating: build/custom_test_artifacts/custom-backend-build/ 2022-05-18T04:03:48.6735416Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2022-05-18T04:03:48.6741852Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log 2022-05-18T04:03:48.6742915Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/ 2022-05-18T04:03:48.6743489Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeSystem.cmake 2022-05-18T04:03:48.6744304Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/ 2022-05-18T04:03:48.6744887Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/tmp/ 2022-05-18T04:03:48.6745917Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c 2022-05-18T04:03:48.6747433Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/a.out 2022-05-18T04:03:48.6748191Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/ 2022-05-18T04:03:48.6748779Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/ 2022-05-18T04:03:48.6750669Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-05-18T04:03:48.6751941Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out 2022-05-18T04:03:48.6753477Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin 2022-05-18T04:03:48.6754362Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake 2022-05-18T04:03:48.6755643Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin 2022-05-18T04:03:48.6756857Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake 2022-05-18T04:03:48.6757646Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/ 2022-05-18T04:03:48.6758237Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/ 2022-05-18T04:03:48.6802924Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-05-18T04:03:48.6803864Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-05-18T04:03:48.6805134Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-05-18T04:03:48.6806610Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-05-18T04:03:48.6807725Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-05-18T04:03:48.6808772Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-05-18T04:03:48.6809623Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_30.cubin 2022-05-18T04:03:48.6810346Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-05-18T04:03:48.6811068Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-05-18T04:03:48.6846267Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-05-18T04:03:48.6881124Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-05-18T04:03:48.6882462Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-05-18T04:03:48.6883644Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_30.cubin 2022-05-18T04:03:48.6884616Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-05-18T04:03:48.6885653Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-05-18T04:03:48.6886309Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-05-18T04:03:48.6887264Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o 2022-05-18T04:03:48.6887928Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-05-18T04:03:48.6946978Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out 2022-05-18T04:03:48.7006700Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin 2022-05-18T04:03:48.7007678Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake 2022-05-18T04:03:48.7008733Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2022-05-18T04:03:48.7009738Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log 2022-05-18T04:03:48.7010718Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2022-05-18T04:03:48.7011811Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2022-05-18T04:03:48.7012577Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2022-05-18T04:03:48.7013792Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2022-05-18T04:03:48.7014831Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2022-05-18T04:03:48.7015602Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2022-05-18T04:03:48.7016223Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2022-05-18T04:03:48.7016841Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2022-05-18T04:03:48.7017458Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2022-05-18T04:03:48.7018083Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2022-05-18T04:03:48.7018947Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2022-05-18T04:03:48.7019725Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2022-05-18T04:03:48.7164749Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2022-05-18T04:03:48.7165788Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2022-05-18T04:03:48.7166933Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2022-05-18T04:03:48.7167830Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2022-05-18T04:03:48.7168995Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2022-05-18T04:03:48.7170058Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2022-05-18T04:03:48.7170722Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2022-05-18T04:03:48.7171360Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2022-05-18T04:03:48.7171994Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2022-05-18T04:03:48.7172635Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2022-05-18T04:03:48.7173270Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2022-05-18T04:03:48.7190118Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2022-05-18T04:03:48.7247041Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2022-05-18T04:03:48.7248204Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-05-18T04:03:48.7249134Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2022-05-18T04:03:48.7249797Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2022-05-18T04:03:48.7250839Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2022-05-18T04:03:48.7251430Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2022-05-18T04:03:48.7251984Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2022-05-18T04:03:48.7253959Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2022-05-18T04:03:48.7254794Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2022-05-18T04:03:48.7255558Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2022-05-18T04:03:48.7372641Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2022-05-18T04:03:48.7416946Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2022-05-18T04:03:48.7417413Z creating: build/lib/ 2022-05-18T04:03:48.7418123Z inflating: build/lib/libclog.a 2022-05-18T04:03:48.7484293Z inflating: build/lib/libgtest.a 2022-05-18T04:03:48.7494461Z inflating: build/lib/libpthreadpool.a 2022-05-18T04:03:48.7584271Z inflating: build/lib/libbenchmark.a 2022-05-18T04:03:48.7690299Z inflating: build/lib/libprotobuf-lite.a 2022-05-18T04:03:48.7722201Z inflating: build/lib/libtensorpipe_uv.a 2022-05-18T04:03:48.7777892Z inflating: build/lib/libasmjit.a 2022-05-18T04:03:48.7913052Z inflating: build/lib/libgloo.a 2022-05-18T04:03:48.8446296Z inflating: build/lib/libprotobuf.a 2022-05-18T04:03:48.8465986Z inflating: build/lib/libfmt.a 2022-05-18T04:03:48.8467805Z inflating: build/lib/libcaffe2_nvrtc.so 2022-05-18T04:03:48.8468468Z inflating: build/lib/libfoxi_loader.a 2022-05-18T04:03:48.8534384Z inflating: build/lib/libc10.so 2022-05-18T04:03:48.8535452Z inflating: build/lib/libtorch_global_deps.so 2022-05-18T04:03:48.8545564Z inflating: build/lib/libcpuinfo.a 2022-05-18T04:03:48.8554393Z inflating: build/lib/libcpuinfo_internals.a 2022-05-18T04:03:48.8570117Z inflating: build/lib/libqnnpack.a 2022-05-18T04:03:48.9140070Z inflating: build/lib/libprotoc.a 2022-05-18T04:03:48.9142606Z inflating: build/lib/libnnpack_reference_layers.a 2022-05-18T04:03:48.9166761Z inflating: build/lib/libpytorch_qnnpack.a 2022-05-18T04:03:48.9185737Z inflating: build/lib/libgmock.a 2022-05-18T04:03:48.9186332Z inflating: build/lib/libgtest_main.a 2022-05-18T04:03:48.9187193Z inflating: build/lib/libbenchmark_main.a 2022-05-18T04:03:48.9209477Z inflating: build/lib/libnnpack.a 2022-05-18T04:03:49.7318423Z inflating: build/lib/libdnnl.a 2022-05-18T04:03:49.7973872Z inflating: build/lib/libtensorpipe.a 2022-05-18T04:03:49.8016945Z inflating: build/lib/libc10_cuda.so 2022-05-18T04:03:49.9535381Z inflating: build/lib/libfbgemm.a 2022-05-18T04:03:49.9536311Z inflating: build/lib/libgmock_main.a 2022-05-18T04:03:50.0664869Z inflating: build/lib/libdnnl_graph.a 2022-05-18T04:03:50.1092082Z inflating: build/lib/libkineto.a 2022-05-18T04:03:50.1381662Z inflating: build/lib/libtensorpipe_cuda.a 2022-05-18T04:03:50.1427055Z inflating: build/lib/libcaffe2_protos.a 2022-05-18T04:03:50.1474860Z inflating: build/lib/libonnx_proto.a 2022-05-18T04:03:50.1616296Z inflating: build/lib/libXNNPACK.a 2022-05-18T04:03:50.2280008Z inflating: build/lib/libonnx.a 2022-05-18T04:03:50.2709317Z inflating: build/lib/libgloo_cuda.a 2022-05-18T04:03:52.3682919Z inflating: build/lib/libtorch_cpu.so 2022-05-18T04:03:54.3546798Z inflating: build/lib/libtorch_cuda.so 2022-05-18T04:03:54.3547777Z inflating: build/lib/libtorch.so 2022-05-18T04:03:54.3551663Z inflating: build/lib/libc10d_cuda_test.so 2022-05-18T04:03:54.9695206Z inflating: build/lib/libtorch_cuda_linalg.so 2022-05-18T04:03:54.9718787Z inflating: build/lib/libjitbackend_test.so 2022-05-18T04:03:54.9749656Z inflating: build/lib/libbackend_with_compiler.so 2022-05-18T04:03:54.9802580Z inflating: build/lib/libtorchbind_test.so 2022-05-18T04:03:54.9808032Z inflating: build/lib/libshm.so 2022-05-18T04:03:55.1387521Z inflating: build/lib/libtorch_python.so 2022-05-18T04:03:55.1425628Z inflating: build/lib/libnnapi_backend.so 2022-05-18T04:03:55.1426035Z creating: build/bin/ 2022-05-18T04:03:55.1478256Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2022-05-18T04:03:55.1533200Z inflating: build/bin/c10_DeviceGuard_test 2022-05-18T04:03:55.1586934Z inflating: build/bin/c10_Device_test 2022-05-18T04:03:55.1648542Z inflating: build/bin/c10_DispatchKeySet_test 2022-05-18T04:03:55.1699630Z inflating: build/bin/c10_StreamGuard_test 2022-05-18T04:03:55.1759028Z inflating: build/bin/c10_InlineDeviceGuard_test 2022-05-18T04:03:55.1818479Z inflating: build/bin/c10_InlineStreamGuard_test 2022-05-18T04:03:55.1879483Z inflating: build/bin/c10_SizesAndStrides_test 2022-05-18T04:03:55.1930425Z inflating: build/bin/c10_Array_test 2022-05-18T04:03:55.1987085Z inflating: build/bin/c10_Bitset_test 2022-05-18T04:03:55.2041400Z inflating: build/bin/c10_C++17_test 2022-05-18T04:03:55.2117178Z inflating: build/bin/c10_ConstexprCrc_test 2022-05-18T04:03:55.2169503Z inflating: build/bin/c10_DeadlockDetection_test 2022-05-18T04:03:55.2222304Z inflating: build/bin/c10_Half_test 2022-05-18T04:03:55.2283130Z inflating: build/bin/c10_LeftRight_test 2022-05-18T04:03:55.2350172Z inflating: build/bin/c10_Metaprogramming_test 2022-05-18T04:03:55.2505858Z inflating: build/bin/c10_SmallVectorTest 2022-05-18T04:03:55.2559479Z inflating: build/bin/c10_Synchronized_test 2022-05-18T04:03:55.2620437Z inflating: build/bin/c10_ThreadLocal_test 2022-05-18T04:03:55.2676529Z inflating: build/bin/c10_TypeIndex_test 2022-05-18T04:03:55.2730309Z inflating: build/bin/c10_TypeList_test 2022-05-18T04:03:55.2781542Z inflating: build/bin/c10_TypeTraits_test 2022-05-18T04:03:55.2836570Z inflating: build/bin/c10_accumulate_test 2022-05-18T04:03:55.2896033Z inflating: build/bin/c10_bfloat16_test 2022-05-18T04:03:55.2953681Z inflating: build/bin/c10_complex_math_test 2022-05-18T04:03:55.3012879Z inflating: build/bin/c10_complex_test 2022-05-18T04:03:55.3130520Z inflating: build/bin/c10_either_test 2022-05-18T04:03:55.3186268Z inflating: build/bin/c10_exception_test 2022-05-18T04:03:55.3239517Z inflating: build/bin/c10_flags_test 2022-05-18T04:03:55.3421538Z inflating: build/bin/c10_intrusive_ptr_test 2022-05-18T04:03:55.3475497Z inflating: build/bin/c10_irange_test 2022-05-18T04:03:55.3536848Z inflating: build/bin/c10_logging_test 2022-05-18T04:03:55.3603226Z inflating: build/bin/c10_ordered_preserving_dict_test 2022-05-18T04:03:55.3683254Z inflating: build/bin/c10_optional_test 2022-05-18T04:03:55.3741696Z inflating: build/bin/c10_registry_test 2022-05-18T04:03:55.3805020Z inflating: build/bin/c10_string_view_test 2022-05-18T04:03:55.3859635Z inflating: build/bin/c10_tempfile_test 2022-05-18T04:03:55.3919813Z inflating: build/bin/c10_typeid_test 2022-05-18T04:03:55.3978532Z inflating: build/bin/c10_intrusive_ptr_benchmark 2022-05-18T04:03:55.4499800Z inflating: build/bin/protoc-3.13.0.0 2022-05-18T04:03:55.5020484Z inflating: build/bin/protoc 2022-05-18T04:03:55.5072288Z inflating: build/bin/c10_cuda_CUDATest 2022-05-18T04:03:55.5389713Z inflating: build/bin/vec_test_all_types_DEFAULT 2022-05-18T04:03:55.5743558Z inflating: build/bin/vec_test_all_types_AVX2 2022-05-18T04:03:55.5800940Z inflating: build/bin/HashStoreTest 2022-05-18T04:03:55.5857965Z inflating: build/bin/FileStoreTest 2022-05-18T04:03:55.5922604Z inflating: build/bin/TCPStoreTest 2022-05-18T04:03:55.5938094Z inflating: build/bin/ProcessGroupMPITest 2022-05-18T04:03:55.5941093Z inflating: build/bin/example_allreduce 2022-05-18T04:03:55.5997156Z inflating: build/bin/Dimname_test 2022-05-18T04:03:55.6058470Z inflating: build/bin/scalar_test 2022-05-18T04:03:55.6122454Z inflating: build/bin/apply_utils_test 2022-05-18T04:03:55.6186229Z inflating: build/bin/basic 2022-05-18T04:03:55.6249203Z inflating: build/bin/atest 2022-05-18T04:03:55.6310819Z inflating: build/bin/NamedTensor_test 2022-05-18T04:03:55.6368402Z inflating: build/bin/broadcast_test 2022-05-18T04:03:55.6422639Z inflating: build/bin/wrapdim_test 2022-05-18T04:03:55.6501487Z inflating: build/bin/Dict_test 2022-05-18T04:03:55.6554498Z inflating: build/bin/dlconvertor_test 2022-05-18T04:03:55.6614998Z inflating: build/bin/half_test 2022-05-18T04:03:55.6674861Z inflating: build/bin/native_test 2022-05-18T04:03:55.6676154Z inflating: build/bin/verify_api_visibility 2022-05-18T04:03:55.6732083Z inflating: build/bin/undefined_tensor_test 2022-05-18T04:03:55.6734847Z inflating: build/bin/thread_init_test 2022-05-18T04:03:55.6795192Z inflating: build/bin/scalar_tensor_test 2022-05-18T04:03:55.6854988Z inflating: build/bin/test_parallel 2022-05-18T04:03:55.6909436Z inflating: build/bin/weakref_test 2022-05-18T04:03:55.6961992Z inflating: build/bin/lazy_tensor_test 2022-05-18T04:03:55.7023226Z inflating: build/bin/quantized_test 2022-05-18T04:03:55.7076974Z inflating: build/bin/operators_test 2022-05-18T04:03:55.7137172Z inflating: build/bin/extension_backend_test 2022-05-18T04:03:55.7193687Z inflating: build/bin/math_kernel_test 2022-05-18T04:03:55.7248672Z inflating: build/bin/memory_overlapping_test 2022-05-18T04:03:55.7301583Z inflating: build/bin/variant_test 2022-05-18T04:03:55.7385778Z inflating: build/bin/tensor_iterator_test 2022-05-18T04:03:55.7441342Z inflating: build/bin/cpu_profiling_allocator_test 2022-05-18T04:03:55.7504472Z inflating: build/bin/cpu_generator_test 2022-05-18T04:03:55.7558781Z inflating: build/bin/reportMemoryUsage_test 2022-05-18T04:03:55.7611535Z inflating: build/bin/reduce_ops_test 2022-05-18T04:03:55.7667549Z inflating: build/bin/memory_format_test 2022-05-18T04:03:55.7737036Z inflating: build/bin/pow_test 2022-05-18T04:03:55.7792840Z inflating: build/bin/mobile_memory_cleanup 2022-05-18T04:03:55.7845845Z inflating: build/bin/dispatch_key_set_test 2022-05-18T04:03:55.7910527Z inflating: build/bin/IListRef_test 2022-05-18T04:03:55.8030267Z inflating: build/bin/List_test 2022-05-18T04:03:55.8085127Z inflating: build/bin/stride_properties_test 2022-05-18T04:03:55.8158397Z inflating: build/bin/vmap_test 2022-05-18T04:03:55.8286577Z inflating: build/bin/kernel_function_legacy_test 2022-05-18T04:03:55.8390025Z inflating: build/bin/kernel_function_test 2022-05-18T04:03:55.8525727Z inflating: build/bin/kernel_lambda_legacy_test 2022-05-18T04:03:55.8637037Z inflating: build/bin/kernel_lambda_test 2022-05-18T04:03:55.8701018Z inflating: build/bin/kernel_stackbased_test 2022-05-18T04:03:55.8803392Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2022-05-18T04:03:55.8857437Z inflating: build/bin/CppSignature_test 2022-05-18T04:03:55.8908813Z inflating: build/bin/op_allowlist_test 2022-05-18T04:03:55.9217638Z inflating: build/bin/op_registration_test 2022-05-18T04:03:55.9312465Z inflating: build/bin/cpu_rng_test 2022-05-18T04:03:55.9369130Z inflating: build/bin/inline_container_test 2022-05-18T04:03:55.9437988Z inflating: build/bin/KernelFunction_test 2022-05-18T04:03:55.9502807Z inflating: build/bin/type_test 2022-05-18T04:03:55.9565572Z inflating: build/bin/cuda_atomic_ops_test 2022-05-18T04:03:55.9666752Z inflating: build/bin/ivalue_test 2022-05-18T04:03:55.9740525Z inflating: build/bin/cuda_complex_math_test 2022-05-18T04:03:55.9804066Z inflating: build/bin/cuda_complex_test 2022-05-18T04:03:55.9860011Z inflating: build/bin/cuda_apply_test 2022-05-18T04:03:55.9914825Z inflating: build/bin/cuda_integer_divider_test 2022-05-18T04:03:55.9980342Z inflating: build/bin/cuda_stream_test 2022-05-18T04:03:56.0040821Z inflating: build/bin/backend_fallback_test 2022-05-18T04:03:56.0098086Z inflating: build/bin/cuda_caching_host_allocator_test 2022-05-18T04:03:56.0154254Z inflating: build/bin/cuda_reportMemoryUsage_test 2022-05-18T04:03:56.0207749Z inflating: build/bin/cuda_dlconvertor_test 2022-05-18T04:03:56.0260822Z inflating: build/bin/cuda_half_test 2022-05-18T04:03:56.0316574Z inflating: build/bin/cuda_packedtensoraccessor_test 2022-05-18T04:03:56.0401658Z inflating: build/bin/cuda_cub_test 2022-05-18T04:03:56.0453593Z inflating: build/bin/cuda_optional_test 2022-05-18T04:03:56.0516885Z inflating: build/bin/cuda_distributions_test 2022-05-18T04:03:56.0573983Z inflating: build/bin/cuda_vectorized_test 2022-05-18T04:03:56.0626100Z inflating: build/bin/cuda_cudnn_test 2022-05-18T04:03:56.0689140Z inflating: build/bin/cuda_generator_test 2022-05-18T04:03:56.0758883Z inflating: build/bin/ProcessGroupGlooTest 2022-05-18T04:03:56.0821574Z inflating: build/bin/ProcessGroupGlooAsyncTest 2022-05-18T04:03:56.0887437Z inflating: build/bin/ProcessGroupNCCLTest 2022-05-18T04:03:56.0904444Z inflating: build/bin/tutorial_tensorexpr 2022-05-18T04:03:56.0968166Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2022-05-18T04:03:56.1025453Z inflating: build/bin/test_dist_autograd 2022-05-18T04:03:56.1099969Z inflating: build/bin/test_cpp_rpc 2022-05-18T04:03:56.1173946Z inflating: build/bin/test_mobile_nnc 2022-05-18T04:03:56.1176593Z inflating: build/bin/parallel_benchmark 2022-05-18T04:03:56.1187967Z inflating: build/bin/aot_model_compiler_test 2022-05-18T04:03:56.2105109Z inflating: build/bin/test_tensorexpr 2022-05-18T04:03:56.2489831Z inflating: build/bin/test_lazy 2022-05-18T04:03:56.2495409Z inflating: build/bin/torch_shm_manager 2022-05-18T04:03:56.2627566Z inflating: build/bin/nvfuser_bench 2022-05-18T04:03:56.3919484Z inflating: build/bin/test_api 2022-05-18T04:03:56.4872325Z inflating: build/bin/test_jit 2022-05-18T04:03:56.4873524Z inflating: .pytorch-test-times.json 2022-05-18T04:03:56.4908907Z ##[group]Run df -H 2022-05-18T04:03:56.4909162Z df -H 2022-05-18T04:03:56.4922602Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:03:56.4922902Z env: 2022-05-18T04:03:56.4923123Z IN_CI: 1 2022-05-18T04:03:56.4923330Z IS_GHA: 1 2022-05-18T04:03:56.4923581Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:03:56.4923851Z GPU_FLAG: --gpus all 2022-05-18T04:03:56.4924083Z ##[endgroup] 2022-05-18T04:03:56.4962920Z Filesystem Size Used Avail Use% Mounted on 2022-05-18T04:03:56.4963528Z devtmpfs 129G 0 129G 0% /dev 2022-05-18T04:03:56.4964038Z tmpfs 129G 0 129G 0% /dev/shm 2022-05-18T04:03:56.4964831Z tmpfs 129G 590k 129G 1% /run 2022-05-18T04:03:56.4965376Z tmpfs 129G 0 129G 0% /sys/fs/cgroup 2022-05-18T04:03:56.4965917Z /dev/xvda1 162G 22G 140G 14% / 2022-05-18T04:03:56.4966439Z tmpfs 26G 0 26G 0% /run/user/0 2022-05-18T04:03:56.5002754Z ##[group]Run .github/scripts/parse_ref.py 2022-05-18T04:03:56.5003394Z .github/scripts/parse_ref.py 2022-05-18T04:03:56.5022045Z shell: /usr/bin/bash -e {0} 2022-05-18T04:03:56.5022509Z env: 2022-05-18T04:03:56.5022911Z IN_CI: 1 2022-05-18T04:03:56.5023327Z IS_GHA: 1 2022-05-18T04:03:56.5024021Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:03:56.5024546Z GPU_FLAG: --gpus all 2022-05-18T04:03:56.5025008Z ##[endgroup] 2022-05-18T04:03:56.5375144Z ##[group]Run set -x 2022-05-18T04:03:56.5375537Z set -x 2022-05-18T04:03:56.5375767Z  2022-05-18T04:03:56.5376038Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2022-05-18T04:03:56.5376375Z  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh 2022-05-18T04:03:56.5376726Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2022-05-18T04:03:56.5377064Z  TEST_COMMAND=.jenkins/caffe2/test.sh 2022-05-18T04:03:56.5377321Z else 2022-05-18T04:03:56.5377602Z  TEST_COMMAND=.jenkins/pytorch/test.sh 2022-05-18T04:03:56.5377876Z fi 2022-05-18T04:03:56.5378096Z  2022-05-18T04:03:56.5378403Z COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") 2022-05-18T04:03:56.5378753Z export COMMIT_MESSAGES 2022-05-18T04:03:56.5379006Z  2022-05-18T04:03:56.5379301Z # detached container should get cleaned up by teardown_ec2_linux 2022-05-18T04:03:56.5379735Z # TODO: Stop building test binaries as part of the build phase 2022-05-18T04:03:56.5380114Z # Used for GPU_FLAG since that doesn't play nice 2022-05-18T04:03:56.5380430Z # shellcheck disable=SC2086,SC2090 2022-05-18T04:03:56.5380735Z container_name=$(docker run \ 2022-05-18T04:03:56.5381013Z  ${GPU_FLAG:-} \ 2022-05-18T04:03:56.5381398Z  -e BUILD_ENVIRONMENT \ 2022-05-18T04:03:56.5381672Z  -e PR_NUMBER \ 2022-05-18T04:03:56.5381967Z  -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ 2022-05-18T04:03:56.5382244Z  -e GITHUB_ACTIONS \ 2022-05-18T04:03:56.5382499Z  -e IN_CI \ 2022-05-18T04:03:56.5382740Z  -e IS_GHA \ 2022-05-18T04:03:56.5382970Z  -e BRANCH \ 2022-05-18T04:03:56.5383213Z  -e SHA1 \ 2022-05-18T04:03:56.5383472Z  -e AWS_DEFAULT_REGION \ 2022-05-18T04:03:56.5384051Z  -e IN_WHEEL_TEST \ 2022-05-18T04:03:56.5384305Z  -e SHARD_NUMBER \ 2022-05-18T04:03:56.5384570Z  -e JOB_BASE_NAME \ 2022-05-18T04:03:56.5384830Z  -e TEST_CONFIG \ 2022-05-18T04:03:56.5385080Z  -e NUM_TEST_SHARDS \ 2022-05-18T04:03:56.5385342Z  -e PR_BODY \ 2022-05-18T04:03:56.5385604Z  -e COMMIT_MESSAGES \ 2022-05-18T04:03:56.5385878Z  -e PYTORCH_RETRY_TEST_CASES \ 2022-05-18T04:03:56.5386161Z  -e PR_LABELS \ 2022-05-18T04:03:56.5386450Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2022-05-18T04:03:56.5386726Z  -e SCCACHE_BUCKET \ 2022-05-18T04:03:56.5386984Z  -e XLA_CUDA \ 2022-05-18T04:03:56.5387268Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2022-05-18T04:03:56.5387596Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2022-05-18T04:03:56.5387919Z  --ulimit stack=10485760:83886080 \ 2022-05-18T04:03:56.5388233Z  --security-opt seccomp=unconfined \ 2022-05-18T04:03:56.5388546Z  --cap-add=SYS_PTRACE \ 2022-05-18T04:03:56.5388801Z  --ipc=host \ 2022-05-18T04:03:56.5389068Z  --shm-size="${SHM_SIZE}" \ 2022-05-18T04:03:56.5389329Z  --tty \ 2022-05-18T04:03:56.5389552Z  --detach \ 2022-05-18T04:03:56.5389823Z  --name="${container_name}" \ 2022-05-18T04:03:56.5390096Z  --user jenkins \ 2022-05-18T04:03:56.5390405Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2022-05-18T04:03:56.5390745Z  -w /var/lib/jenkins/workspace \ 2022-05-18T04:03:56.5391027Z  "${DOCKER_IMAGE}" 2022-05-18T04:03:56.5391251Z ) 2022-05-18T04:03:56.5391586Z docker exec -t "${container_name}" sh -c "pip install dist/*.whl && ${TEST_COMMAND}" 2022-05-18T04:03:56.5403649Z shell: /usr/bin/bash -e {0} 2022-05-18T04:03:56.5403883Z env: 2022-05-18T04:03:56.5404096Z IN_CI: 1 2022-05-18T04:03:56.5404315Z IS_GHA: 1 2022-05-18T04:03:56.5404544Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:03:56.5404910Z GPU_FLAG: --gpus all 2022-05-18T04:03:56.5405251Z BUILD_ENVIRONMENT: linux-bionic-cuda10.2-py3.9-gcc7 2022-05-18T04:03:56.5405561Z PR_NUMBER: 2022-05-18T04:03:56.5405781Z BRANCH: master 2022-05-18T04:03:56.5406083Z CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts 2022-05-18T04:03:56.5406427Z SHA1: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T04:03:56.5406718Z PYTORCH_RETRY_TEST_CASES: 1 2022-05-18T04:03:56.5407053Z JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test 2022-05-18T04:03:56.5407377Z TEST_CONFIG: distributed 2022-05-18T04:03:56.5407613Z SHARD_NUMBER: 2 2022-05-18T04:03:56.5407851Z NUM_TEST_SHARDS: 2 2022-05-18T04:03:56.5408088Z PR_BODY: 2022-05-18T04:03:56.5408375Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2022-05-18T04:03:56.5408679Z SHM_SIZE: 2g 2022-05-18T04:03:56.5409169Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:03:56.5409653Z XLA_CUDA: 2022-05-18T04:03:56.5409987Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2022-05-18T04:03:56.5410338Z ##[endgroup] 2022-05-18T04:03:56.5438459Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2022-05-18T04:03:56.5438936Z + [[ linux-bionic-cuda10.2-py3.9-gcc7 == *onnx* ]] 2022-05-18T04:03:56.5439260Z + TEST_COMMAND=.jenkins/pytorch/test.sh 2022-05-18T04:03:56.5441994Z ++ git cherry -v origin/master 2022-05-18T04:03:56.5475348Z + COMMIT_MESSAGES= 2022-05-18T04:03:56.5475808Z + export COMMIT_MESSAGES 2022-05-18T04:03:56.5484806Z +++ nproc --ignore=2 2022-05-18T04:03:56.5497015Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e CUSTOM_TEST_ARTIFACT_BUILD_DIR -e GITHUB_ACTIONS -e IN_CI -e IS_GHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e JOB_BASE_NAME -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e PYTORCH_RETRY_TEST_CASES -e PR_LABELS -e MAX_JOBS=30 -e SCCACHE_BUCKET -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME --env-file=/tmp/github_env_2342799949 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:04:19.8872447Z + container_name=04c3040422fca0f61bbc0b8d1c290660850f2b3df08e97daf38cf60eb8907ef4 2022-05-18T04:04:19.8874708Z + docker exec -t 04c3040422fca0f61bbc0b8d1c290660850f2b3df08e97daf38cf60eb8907ef4 sh -c 'pip install dist/*.whl && .jenkins/pytorch/test.sh' 2022-05-18T04:04:20.3862246Z Processing ./dist/torch-1.12.0a0+git3b23752-cp39-cp39-linux_x86_64.whl 2022-05-18T04:04:20.4812410Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.9/site-packages (from torch==1.12.0a0+git3b23752) (4.2.0) 2022-05-18T04:04:21.0231100Z Installing collected packages: torch 2022-05-18T04:04:30.5485790Z Successfully installed torch-1.12.0a0+git3b23752 2022-05-18T04:04:30.6056975Z + COMPACT_JOB_NAME=linux-bionic-cuda10.2-py3.9-gcc7 2022-05-18T04:04:30.6059499Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2022-05-18T04:04:30.6340115Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.9/site-packages/torch 2022-05-18T04:04:30.6342408Z + TORCH_BIN_DIR=/opt/conda/lib/python3.9/site-packages/torch/bin 2022-05-18T04:04:30.6343183Z + TORCH_LIB_DIR=/opt/conda/lib/python3.9/site-packages/torch/lib 2022-05-18T04:04:30.6344381Z + TORCH_TEST_DIR=/opt/conda/lib/python3.9/site-packages/torch/test 2022-05-18T04:04:30.6344721Z + BUILD_DIR=build 2022-05-18T04:04:30.6344990Z + BUILD_RENAMED_DIR=build_renamed 2022-05-18T04:04:30.6345260Z + BUILD_BIN_DIR=build/bin 2022-05-18T04:04:30.6345586Z + [[ -n distributed ]] 2022-05-18T04:04:30.6346028Z + BUILD_ENVIRONMENT=linux-bionic-cuda10.2-py3.9-gcc7-distributed 2022-05-18T04:04:30.6346785Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed != *bazel* ]] 2022-05-18T04:04:30.6347168Z ++ realpath build/custom_test_artifacts 2022-05-18T04:04:30.6351975Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2022-05-18T04:04:30.6355530Z ++ dirname .jenkins/pytorch/test.sh 2022-05-18T04:04:30.6362379Z + source .jenkins/pytorch/common.sh 2022-05-18T04:04:30.6366481Z +++ dirname .jenkins/pytorch/common.sh 2022-05-18T04:04:30.6376619Z ++ source .jenkins/pytorch/common_utils.sh 2022-05-18T04:04:30.6380545Z +++ TORCHVISION_COMMIT=8a2dc6f22ac4389ccba8859aa1e1cb14f1ee53db 2022-05-18T04:04:30.6380897Z ++ set -ex 2022-05-18T04:04:30.6389483Z ++++ dirname .jenkins/pytorch/common.sh 2022-05-18T04:04:30.6399414Z +++ cd .jenkins/pytorch 2022-05-18T04:04:30.6399696Z +++ pwd -P 2022-05-18T04:04:30.6402536Z ++ SCRIPT_DIR=/var/lib/jenkins/workspace/.jenkins/pytorch 2022-05-18T04:04:30.6403057Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *linux* ]] 2022-05-18T04:04:30.6406124Z +++ find /etc/apt/ -type f -name '*.list' 2022-05-18T04:04:30.6425499Z ++ sudo sed -i 's/.*nvidia.*/# &/' /etc/apt/sources.list /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list /etc/apt/sources.list.d/ubuntu-toolchain-r-ubuntu-test-bionic.list 2022-05-18T04:04:30.6490262Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *rocm* ]] 2022-05-18T04:04:30.6490770Z ++ echo ENTERED_USER_LAND 2022-05-18T04:04:30.6491221Z ENTERED_USER_LAND 2022-05-18T04:04:30.6491452Z ++ export IN_CI=1 2022-05-18T04:04:30.6491689Z ++ IN_CI=1 2022-05-18T04:04:30.6492151Z ++ declare -f -t trap_add 2022-05-18T04:04:30.6492440Z ++ trap_add cleanup EXIT 2022-05-18T04:04:30.6492693Z ++ trap_add_cmd=cleanup 2022-05-18T04:04:30.6492942Z ++ shift 2022-05-18T04:04:30.6493187Z ++ for trap_add_name in "$@" 2022-05-18T04:04:30.6500538Z ++++ trap -p EXIT 2022-05-18T04:04:30.6503866Z +++ eval 'extract_trap_cmd ' 2022-05-18T04:04:30.6504460Z ++++ extract_trap_cmd 2022-05-18T04:04:30.6505276Z ++++ printf '%s\n' '' 2022-05-18T04:04:30.6505701Z +++ printf '%s\n' cleanup 2022-05-18T04:04:30.6507914Z ++ trap -- ' 2022-05-18T04:04:30.6508203Z cleanup' EXIT 2022-05-18T04:04:30.6510319Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed != *win-* ]] 2022-05-18T04:04:30.6510651Z ++ which sccache 2022-05-18T04:04:30.6521095Z ++ sccache --stop-server 2022-05-18T04:04:30.6549947Z ++ true 2022-05-18T04:04:30.6550650Z ++ rm -f /var/lib/jenkins/sccache_error.log 2022-05-18T04:04:30.6559335Z ++ [[ -n '' ]] 2022-05-18T04:04:30.6559792Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *rocm* ]] 2022-05-18T04:04:30.6560196Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2022-05-18T04:04:30.6560495Z ++ SCCACHE_IDLE_TIMEOUT=1200 2022-05-18T04:04:30.6577771Z ++ RUST_LOG=sccache::server=error 2022-05-18T04:04:30.6578168Z ++ sccache --start-server 2022-05-18T04:04:30.6581351Z sccache: Starting the server... 2022-05-18T04:04:30.6771757Z ++ sccache --zero-stats 2022-05-18T04:04:30.6793928Z Compile requests 0 2022-05-18T04:04:30.6794231Z Compile requests executed 0 2022-05-18T04:04:30.6794542Z Cache hits 0 2022-05-18T04:04:30.6794816Z Cache misses 0 2022-05-18T04:04:30.6795077Z Cache timeouts 0 2022-05-18T04:04:30.6795356Z Cache read errors 0 2022-05-18T04:04:30.6795751Z Forced recaches 0 2022-05-18T04:04:30.6796040Z Cache write errors 0 2022-05-18T04:04:30.6796340Z Compilation failures 0 2022-05-18T04:04:30.6796627Z Cache errors 0 2022-05-18T04:04:30.6796987Z Non-cacheable compilations 0 2022-05-18T04:04:30.6797320Z Non-cacheable calls 0 2022-05-18T04:04:30.6797667Z Non-compilation calls 0 2022-05-18T04:04:30.6797971Z Unsupported compiler calls 0 2022-05-18T04:04:30.6798253Z Average cache write 0.000 s 2022-05-18T04:04:30.6798550Z Average cache read miss 0.000 s 2022-05-18T04:04:30.6799024Z Average cache read hit 0.000 s 2022-05-18T04:04:30.6799334Z Failed distributed compilations 0 2022-05-18T04:04:30.6800066Z Cache location S3, bucket: Bucket(name=ossci-compiler-cache-circleci-v2, base_url=http://ossci-compiler-cache-circleci-v2.s3.amazonaws.com/) 2022-05-18T04:04:30.6800743Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-test == *-build ]] 2022-05-18T04:04:30.6801070Z ++ which ccache 2022-05-18T04:04:30.6808543Z ++ '[' -z linux-bionic-cuda10.2-py3.9-gcc7 ']' 2022-05-18T04:04:30.6809110Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *linux-trusty-py3.6-gcc7* ]] 2022-05-18T04:04:30.6809513Z ++ BUILD_TEST_LIBTORCH=0 2022-05-18T04:04:30.6809792Z ++ [[ distributed == *xla* ]] 2022-05-18T04:04:30.6810213Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *centos* ]] 2022-05-18T04:04:30.6810746Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *linux-bionic* ]] 2022-05-18T04:04:30.6811094Z ++ which conda 2022-05-18T04:04:30.6818130Z /opt/conda/bin/conda 2022-05-18T04:04:30.6818943Z ++ conda install -q -y cmake 2022-05-18T04:04:36.3626475Z Collecting package metadata (current_repodata.json): ...working... done 2022-05-18T04:04:36.8566690Z Solving environment: ...working... done 2022-05-18T04:04:36.9393175Z 2022-05-18T04:04:36.9393304Z ## Package Plan ## 2022-05-18T04:04:36.9393917Z 2022-05-18T04:04:36.9394429Z environment location: /opt/conda 2022-05-18T04:04:36.9394635Z 2022-05-18T04:04:36.9394767Z added / updated specs: 2022-05-18T04:04:36.9395442Z - cmake 2022-05-18T04:04:36.9395697Z 2022-05-18T04:04:36.9395718Z 2022-05-18T04:04:36.9395879Z The following packages will be downloaded: 2022-05-18T04:04:36.9396081Z 2022-05-18T04:04:36.9396200Z package | build 2022-05-18T04:04:36.9396578Z ---------------------------|----------------- 2022-05-18T04:04:36.9397000Z bzip2-1.0.8 | h7b6447c_0 78 KB 2022-05-18T04:04:36.9397608Z c-ares-1.18.1 | h7f8727e_0 114 KB 2022-05-18T04:04:36.9398007Z cmake-3.22.1 | h1fce559_0 7.3 MB 2022-05-18T04:04:36.9398391Z expat-2.4.4 | h295c915_0 169 KB 2022-05-18T04:04:36.9398771Z krb5-1.19.2 | hac12032_0 1.2 MB 2022-05-18T04:04:36.9399141Z libcurl-7.82.0 | h0b77cf5_0 342 KB 2022-05-18T04:04:36.9399541Z libedit-3.1.20210910 | h7f8727e_0 166 KB 2022-05-18T04:04:36.9399928Z libev-4.33 | h7f8727e_1 111 KB 2022-05-18T04:04:36.9400306Z libnghttp2-1.46.0 | hce63b2e_0 680 KB 2022-05-18T04:04:36.9400730Z libssh2-1.10.0 | h8f2d780_0 274 KB 2022-05-18T04:04:36.9401106Z libuv-1.40.0 | h7b6447c_0 736 KB 2022-05-18T04:04:36.9401479Z lz4-c-1.9.3 | h295c915_1 185 KB 2022-05-18T04:04:36.9401843Z rhash-1.4.1 | h3c74f83_1 203 KB 2022-05-18T04:04:36.9402219Z zstd-1.5.2 | ha4553b6_0 488 KB 2022-05-18T04:04:36.9402621Z ------------------------------------------------------------ 2022-05-18T04:04:36.9402931Z Total: 12.0 MB 2022-05-18T04:04:36.9403111Z 2022-05-18T04:04:36.9403272Z The following NEW packages will be INSTALLED: 2022-05-18T04:04:36.9403480Z 2022-05-18T04:04:36.9403831Z bzip2 pkgs/main/linux-64::bzip2-1.0.8-h7b6447c_0 2022-05-18T04:04:36.9404306Z c-ares pkgs/main/linux-64::c-ares-1.18.1-h7f8727e_0 2022-05-18T04:04:36.9404758Z cmake pkgs/main/linux-64::cmake-3.22.1-h1fce559_0 2022-05-18T04:04:36.9405220Z expat pkgs/main/linux-64::expat-2.4.4-h295c915_0 2022-05-18T04:04:36.9405678Z krb5 pkgs/main/linux-64::krb5-1.19.2-hac12032_0 2022-05-18T04:04:36.9406146Z libcurl pkgs/main/linux-64::libcurl-7.82.0-h0b77cf5_0 2022-05-18T04:04:36.9406756Z libedit pkgs/main/linux-64::libedit-3.1.20210910-h7f8727e_0 2022-05-18T04:04:36.9407256Z libev pkgs/main/linux-64::libev-4.33-h7f8727e_1 2022-05-18T04:04:36.9407745Z libnghttp2 pkgs/main/linux-64::libnghttp2-1.46.0-hce63b2e_0 2022-05-18T04:04:36.9408221Z libssh2 pkgs/main/linux-64::libssh2-1.10.0-h8f2d780_0 2022-05-18T04:04:36.9408700Z libuv pkgs/main/linux-64::libuv-1.40.0-h7b6447c_0 2022-05-18T04:04:36.9409162Z lz4-c pkgs/main/linux-64::lz4-c-1.9.3-h295c915_1 2022-05-18T04:04:36.9409617Z rhash pkgs/main/linux-64::rhash-1.4.1-h3c74f83_1 2022-05-18T04:04:36.9410059Z zstd pkgs/main/linux-64::zstd-1.5.2-ha4553b6_0 2022-05-18T04:04:36.9410264Z 2022-05-18T04:04:36.9410547Z The following packages will be SUPERSEDED by a higher-priority channel: 2022-05-18T04:04:36.9410795Z 2022-05-18T04:04:36.9411194Z certifi conda-forge::certifi-2021.10.8-py39hf~ --> pkgs/main::certifi-2021.10.8-py39h06a4308_2 2022-05-18T04:04:36.9411823Z conda conda-forge::conda-4.12.0-py39hf3d152~ --> pkgs/main::conda-4.12.0-py39h06a4308_0 2022-05-18T04:04:36.9412090Z 2022-05-18T04:04:36.9412108Z 2022-05-18T04:04:37.9278752Z Preparing transaction: ...working... done 2022-05-18T04:04:38.4637358Z Verifying transaction: ...working... done 2022-05-18T04:04:41.0451622Z Executing transaction: ...working... done 2022-05-18T04:04:41.7838113Z ++ [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *centos* ]] 2022-05-18T04:04:41.7838575Z + echo 'Testing pytorch' 2022-05-18T04:04:41.7838840Z Testing pytorch 2022-05-18T04:04:41.7839109Z + export LANG=C.UTF-8 2022-05-18T04:04:41.7839376Z + LANG=C.UTF-8 2022-05-18T04:04:41.7840933Z + PR_NUMBER= 2022-05-18T04:04:41.7841189Z + [[ distributed == \d\e\f\a\u\l\t ]] 2022-05-18T04:04:41.7841497Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2022-05-18T04:04:41.7841964Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *rocm* ]] 2022-05-18T04:04:41.7842490Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *-slow-* ]] 2022-05-18T04:04:41.7842831Z + [[ distributed == \s\l\o\w ]] 2022-05-18T04:04:41.7843281Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *slow-gradcheck* ]] 2022-05-18T04:04:41.7843816Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *cuda* ]] 2022-05-18T04:04:41.7844184Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2022-05-18T04:04:41.7844517Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2022-05-18T04:04:41.7844971Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *cuda11* ]] 2022-05-18T04:04:41.7845462Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *crossref* ]] 2022-05-18T04:04:41.7845832Z + [[ -n '' ]] 2022-05-18T04:04:41.7846120Z + export PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0 2022-05-18T04:04:41.7846446Z + PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0 2022-05-18T04:04:41.7846871Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *rocm* ]] 2022-05-18T04:04:41.7847390Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed != *ppc64le* ]] 2022-05-18T04:04:41.7847896Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed != *-bazel-* ]] 2022-05-18T04:04:41.7848268Z + pip_install --user ninja 2022-05-18T04:04:41.7848639Z + pip install --progress-bar off --user ninja 2022-05-18T04:04:42.3091595Z Collecting ninja 2022-05-18T04:04:42.3314500Z Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2022-05-18T04:04:42.3390800Z [?25l 2022-05-18T04:04:42.8320963Z [?25hInstalling collected packages: ninja 2022-05-18T04:04:42.8435543Z  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH. 2022-05-18T04:04:42.8436183Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2022-05-18T04:04:42.8498526Z Successfully installed ninja-1.10.2.3 2022-05-18T04:04:42.9040074Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-05-18T04:04:42.9040779Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-05-18T04:04:42.9041632Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *asan* ]] 2022-05-18T04:04:42.9042915Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *-NO_AVX-* ]] 2022-05-18T04:04:42.9043742Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X ]] 2022-05-18T04:04:42.9044655Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *-NO_AVX2-* ]] 2022-05-18T04:04:42.9045454Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2022-05-18T04:04:42.9046396Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *-NO_AVX512-* ]] 2022-05-18T04:04:42.9047161Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]] 2022-05-18T04:04:42.9050688Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *tbb* ]] 2022-05-18T04:04:42.9065532Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *libtorch* ]] 2022-05-18T04:04:42.9066087Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *-bazel-* ]] 2022-05-18T04:04:42.9068782Z + cd test 2022-05-18T04:04:42.9069175Z + python -c 'import torch; print(torch.__config__.show())' 2022-05-18T04:04:47.1843846Z PyTorch built with: 2022-05-18T04:04:47.1844332Z - GCC 7.5 2022-05-18T04:04:47.1844643Z - C++ Version: 201402 2022-05-18T04:04:47.1845548Z - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2022-05-18T04:04:47.1846125Z - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) 2022-05-18T04:04:47.1846532Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2022-05-18T04:04:47.1846894Z - LAPACK is enabled (usually provided by MKL) 2022-05-18T04:04:47.1847225Z - NNPACK is enabled 2022-05-18T04:04:47.1847550Z - CPU capability usage: AVX2 2022-05-18T04:04:47.1847844Z - CUDA Runtime 10.2 2022-05-18T04:04:47.1848245Z - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52 2022-05-18T04:04:47.1848593Z - CuDNN 7.6.5 2022-05-18T04:04:47.1848845Z - Magma 2.5.2 2022-05-18T04:04:47.1851808Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 2022-05-18T04:04:47.1853984Z 2022-05-18T04:04:47.7684530Z + cd test 2022-05-18T04:04:47.7685074Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2022-05-18T04:04:48.5517851Z ATen/Parallel: 2022-05-18T04:04:48.5518226Z at::get_num_threads() : 16 2022-05-18T04:04:48.5518509Z at::get_num_interop_threads() : 16 2022-05-18T04:04:48.5518819Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2022-05-18T04:04:48.5519106Z omp_get_max_threads() : 16 2022-05-18T04:04:48.5519755Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2022-05-18T04:04:48.5520165Z mkl_get_max_threads() : 16 2022-05-18T04:04:48.5520934Z Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) 2022-05-18T04:04:48.5521338Z std::thread::hardware_concurrency() : 32 2022-05-18T04:04:48.5521619Z Environment variables: 2022-05-18T04:04:48.5521896Z OMP_NUM_THREADS : [not set] 2022-05-18T04:04:48.5522176Z MKL_NUM_THREADS : [not set] 2022-05-18T04:04:48.5522438Z ATen parallel backend: OpenMP 2022-05-18T04:04:48.5522633Z 2022-05-18T04:04:48.6597772Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *deploy* ]] 2022-05-18T04:04:48.6598361Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *backward* ]] 2022-05-18T04:04:48.6598722Z + [[ distributed == *xla* ]] 2022-05-18T04:04:48.6599178Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *jit_legacy-test ]] 2022-05-18T04:04:48.6599678Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-test == *jit_legacy-test ]] 2022-05-18T04:04:48.6600054Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2022-05-18T04:04:48.6600527Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *libtorch* ]] 2022-05-18T04:04:48.6601039Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *distributed* ]] 2022-05-18T04:04:48.6601400Z + test_distributed 2022-05-18T04:04:48.6601742Z + echo 'Testing distributed python tests' 2022-05-18T04:04:48.6602056Z Testing distributed python tests 2022-05-18T04:04:48.6602470Z + python test/run_test.py --distributed-tests --shard 2 2 --verbose 2022-05-18T04:04:54.8038656Z Ignoring disabled issues: [] 2022-05-18T04:04:54.8164535Z /var/lib/jenkins/workspace/test/run_test.py:894: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 2022-05-18T04:04:54.8165070Z if torch.version.cuda is not None and LooseVersion(torch.version.cuda) == "11.6": 2022-05-18T04:04:54.8229499Z Found stats for current commit: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 and job: linux-bionic-cuda10.2-py3.9-gcc7. Proceeding with those values. 2022-05-18T04:04:54.8230341Z Selected tests: 2022-05-18T04:04:54.8230892Z distributed/rpc/cuda/test_tensorpipe_agent 2022-05-18T04:04:54.8231630Z distributed/fsdp/test_fsdp_core 2022-05-18T04:04:54.8232180Z distributed/test_c10d_nccl 2022-05-18T04:04:54.8232467Z distributed/test_c10d_gloo 2022-05-18T04:04:54.8232757Z distributed/fsdp/test_fsdp_mixed_precision 2022-05-18T04:04:54.8233109Z distributed/fsdp/test_fsdp_summon_full_params 2022-05-18T04:04:54.8233461Z distributed/optim/test_zero_redundancy_optimizer 2022-05-18T04:04:54.8233837Z distributed/_shard/sharded_tensor/test_sharded_tensor 2022-05-18T04:04:54.8234146Z distributed/test_pg_wrapper 2022-05-18T04:04:54.8234448Z distributed/fsdp/test_fsdp_grad_acc 2022-05-18T04:04:54.8235389Z distributed/test_c10d_spawn_gloo 2022-05-18T04:04:54.8235674Z distributed/fsdp/test_fsdp_comm 2022-05-18T04:04:54.8236009Z distributed/fsdp/test_fsdp_sharded_grad_scaler 2022-05-18T04:04:54.8236328Z distributed/algorithms/test_join 2022-05-18T04:04:54.8236604Z distributed/fsdp/test_fsdp_misc 2022-05-18T04:04:54.8236929Z distributed/_shard/checkpoint/test_checkpoint 2022-05-18T04:04:54.8237453Z distributed/_shard/sharded_tensor/ops/test_matrix_ops 2022-05-18T04:04:54.8237980Z distributed/fsdp/test_fsdp_memory 2022-05-18T04:04:54.8238309Z distributed/_shard/checkpoint/test_file_system_checkpoint 2022-05-18T04:04:54.8238668Z distributed/elastic/timer/local_timer_example 2022-05-18T04:04:54.8238993Z distributed/_shard/test_partial_tensor 2022-05-18T04:04:54.8239292Z distributed/fsdp/test_fsdp_input 2022-05-18T04:04:54.8239624Z distributed/_shard/sharded_tensor/ops/test_tensor_ops 2022-05-18T04:04:54.8239978Z distributed/_shard/sharded_tensor/ops/test_linear 2022-05-18T04:04:54.8240299Z distributed/elastic/timer/local_timer_test 2022-05-18T04:04:54.8240613Z distributed/fsdp/test_fsdp_uneven 2022-05-18T04:04:54.8240916Z distributed/fsdp/test_fsdp_pure_fp16 2022-05-18T04:04:54.8241207Z distributed/fsdp/test_fsdp_traversal 2022-05-18T04:04:54.8241544Z distributed/_shard/sharded_tensor/ops/test_embedding 2022-05-18T04:04:54.8242098Z distributed/_shard/sharded_tensor/ops/test_chunk 2022-05-18T04:04:54.8242473Z distributed/_shard/sharded_tensor/ops/test_softmax 2022-05-18T04:04:54.8242778Z distributed/test_data_parallel 2022-05-18T04:04:54.8243102Z distributed/fsdp/test_flatten_params_wrapper 2022-05-18T04:04:54.8243431Z distributed/elastic/utils/logging_test 2022-05-18T04:04:54.8243723Z distributed/elastic/metrics/api_test 2022-05-18T04:04:54.8244017Z distributed/test_nccl 2022-05-18T04:04:54.8244332Z distributed/_shard/sharded_tensor/ops/test_math_ops 2022-05-18T04:04:54.8244658Z distributed/_shard/test_replicated_tensor 2022-05-18T04:04:54.8244972Z distributed/elastic/events/lib_test 2022-05-18T04:04:54.8245276Z distributed/fsdp/test_shard_utils 2022-05-18T04:04:54.8245574Z distributed/pipeline/sync/skip/test_gpipe 2022-05-18T04:04:54.8245902Z distributed/pipeline/sync/skip/test_leak 2022-05-18T04:04:54.8246239Z distributed/pipeline/sync/skip/test_stash_pop 2022-05-18T04:04:54.8246593Z distributed/pipeline/sync/skip/test_verify_skippables 2022-05-18T04:04:54.8246937Z distributed/pipeline/sync/test_bugs 2022-05-18T04:04:54.8247249Z distributed/pipeline/sync/test_copy 2022-05-18T04:04:54.8247578Z distributed/pipeline/sync/test_dependency 2022-05-18T04:04:54.8247890Z distributed/pipeline/sync/test_microbatch 2022-05-18T04:04:54.8248207Z distributed/pipeline/sync/test_pipe 2022-05-18T04:04:54.8248521Z distributed/pipeline/sync/test_stream 2022-05-18T04:04:54.8248938Z distributed/pipeline/sync/test_worker 2022-05-18T04:04:54.8249255Z distributed/rpc/test_tensorpipe_agent 2022-05-18T04:04:54.8302969Z Prioritized test from test file changes. 2022-05-18T04:04:54.8303778Z reordering tests for PR: 2022-05-18T04:04:54.8304333Z prioritized: [] 2022-05-18T04:04:54.8314510Z the rest: ['distributed/rpc/cuda/test_tensorpipe_agent', 'distributed/fsdp/test_fsdp_core', 'distributed/test_c10d_nccl', 'distributed/test_c10d_gloo', 'distributed/fsdp/test_fsdp_mixed_precision', 'distributed/fsdp/test_fsdp_summon_full_params', 'distributed/optim/test_zero_redundancy_optimizer', 'distributed/_shard/sharded_tensor/test_sharded_tensor', 'distributed/test_pg_wrapper', 'distributed/fsdp/test_fsdp_grad_acc', 'distributed/test_c10d_spawn_gloo', 'distributed/fsdp/test_fsdp_comm', 'distributed/fsdp/test_fsdp_sharded_grad_scaler', 'distributed/algorithms/test_join', 'distributed/fsdp/test_fsdp_misc', 'distributed/_shard/checkpoint/test_checkpoint', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops', 'distributed/fsdp/test_fsdp_memory', 'distributed/_shard/checkpoint/test_file_system_checkpoint', 'distributed/elastic/timer/local_timer_example', 'distributed/_shard/test_partial_tensor', 'distributed/fsdp/test_fsdp_input', 'distributed/_shard/sharded_tensor/ops/test_tensor_ops', 'distributed/_shard/sharded_tensor/ops/test_linear', 'distributed/elastic/timer/local_timer_test', 'distributed/fsdp/test_fsdp_uneven', 'distributed/fsdp/test_fsdp_pure_fp16', 'distributed/fsdp/test_fsdp_traversal', 'distributed/_shard/sharded_tensor/ops/test_embedding', 'distributed/_shard/sharded_tensor/ops/test_chunk', 'distributed/_shard/sharded_tensor/ops/test_softmax', 'distributed/test_data_parallel', 'distributed/fsdp/test_flatten_params_wrapper', 'distributed/elastic/utils/logging_test', 'distributed/elastic/metrics/api_test', 'distributed/test_nccl', 'distributed/_shard/sharded_tensor/ops/test_math_ops', 'distributed/_shard/test_replicated_tensor', 'distributed/elastic/events/lib_test', 'distributed/fsdp/test_shard_utils', 'distributed/pipeline/sync/skip/test_gpipe', 'distributed/pipeline/sync/skip/test_leak', 'distributed/pipeline/sync/skip/test_stash_pop', 'distributed/pipeline/sync/skip/test_verify_skippables', 'distributed/pipeline/sync/test_bugs', 'distributed/pipeline/sync/test_copy', 'distributed/pipeline/sync/test_dependency', 'distributed/pipeline/sync/test_microbatch', 'distributed/pipeline/sync/test_pipe', 'distributed/pipeline/sync/test_stream', 'distributed/pipeline/sync/test_worker', 'distributed/rpc/test_tensorpipe_agent'] 2022-05-18T04:04:54.8320944Z 2022-05-18T04:04:54.8790398Z Running distributed/rpc/cuda/test_tensorpipe_agent ... [2022-05-18 04:04:54.878629] 2022-05-18T04:04:54.8791935Z Executing ['/opt/conda/bin/python', 'distributed/rpc/cuda/test_tensorpipe_agent.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:04:54.878695] 2022-05-18T04:04:55.7658297Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6elzp31m 2022-05-18T04:04:55.7659523Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6elzp31m/_remote_module_non_scriptable.py 2022-05-18T04:04:56.1734095Z ]> 2022-05-18T04:04:56.1734869Z test_ddp_dist_autograd_local_vs_remote_gpu (__main__.TensorPipeCudaDdpComparisonTest) 2022-05-18T04:04:56.1735921Z , <__main__.TensorPipeCudaDistAutogradTest testMethod=test_gpu_to_cpu_continuation>, <__main__.TensorPipeCudaDistAutogradTest testMethod=test_gpu_to_cpu_continuation_gpu_root>]> 2022-05-18T04:04:56.1736674Z test_gpu_simple (__main__.TensorPipeCudaDistAutogradTest) 2022-05-18T04:04:56.1737111Z test_gpu_to_cpu_continuation (__main__.TensorPipeCudaDistAutogradTest) 2022-05-18T04:04:56.1737557Z test_gpu_to_cpu_continuation_gpu_root (__main__.TensorPipeCudaDistAutogradTest) 2022-05-18T04:04:56.1739061Z , <__main__.TensorPipeCudaRemoteModuleTest testMethod=test_input_moved_to_cuda_device_script>, <__main__.TensorPipeCudaRemoteModuleTest testMethod=test_invalid_devices>, <__main__.TensorPipeCudaRemoteModuleTest testMethod=test_valid_device>]> 2022-05-18T04:04:56.1740122Z test_input_moved_to_cuda_device (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:04:56.1740803Z test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:04:56.1741491Z test_invalid_devices (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:04:56.1742117Z test_valid_device (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:04:56.1742611Z ]> 2022-05-18T04:04:56.1743082Z test_profiler_remote_cuda (__main__.TensorPipeCudaRpcTest) 2022-05-18T04:04:56.1744784Z , <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_gloo_ckpt_except_last>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_gloo_ckpt_never>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_gloo_ckpt_never_find_unused>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_always>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_except_last>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_never>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_never_find_unused>]> 2022-05-18T04:04:56.1747091Z test_basic_gloo_ckpt_always (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1747773Z test_basic_gloo_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1748195Z test_basic_gloo_ckpt_never (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1748887Z test_basic_gloo_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1749338Z test_basic_nccl_ckpt_always (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1749766Z test_basic_nccl_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1750170Z test_basic_nccl_ckpt_never (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1750596Z test_basic_nccl_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:04:56.1765247Z , <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_async_execution_with_cuda_future>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_callback_changes_devices>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_cuda_sparse_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_cuda_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_custom_class_with_cuda_sparse_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_custom_class_with_cuda_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_list_with_cuda_sparse_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_list_with_cuda_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_as_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_as_int>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_as_str>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_not_cuda>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_modify_tensor_inplace>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_replace_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_value_on_bad_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream_multi>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream_nested>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream_nested_multi>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_cpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_cpu_to_gpu_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_cpu_to_gpu_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_default_to_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_5>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_6>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_7>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_8>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_5>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_6>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_7>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_8>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_non_default_to_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_to_cpu_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_to_cpu_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_gpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_in_options>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_invalid_max_local_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_invalid_max_remote_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_invalid_min_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_many_to_one>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_loop>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_not_timeout>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_remote>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_remote_response>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_response>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_response_loop>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_multi_gpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_multi_gpu_self>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_one_to_many>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_remote>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_return_to_gpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_return_to_gpu_self>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_wrong_worker_name>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_mismatch>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_devices_option_mismatch>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_devices_option_mismatch_reverse>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_meta_multiple_tensors>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization5>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_with_unpickleable_attributes>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_tensor_view_as_return_value>]> 2022-05-18T04:04:56.1779305Z test_async_execution_nested_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1779820Z test_async_execution_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1780343Z test_cuda_future_callback_changes_devices (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1780881Z test_cuda_future_can_extract_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1781392Z test_cuda_future_can_extract_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1781944Z test_cuda_future_can_extract_custom_class_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1782513Z test_cuda_future_can_extract_custom_class_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1783081Z test_cuda_future_can_extract_list_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1783968Z test_cuda_future_can_extract_list_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1784649Z test_cuda_future_device_as_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1785147Z test_cuda_future_device_as_int (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1785739Z test_cuda_future_device_as_str (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1786211Z test_cuda_future_device_not_cuda (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1786715Z test_cuda_future_modify_tensor_inplace (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1787213Z test_cuda_future_replace_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1787694Z test_cuda_future_value_on_bad_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1788175Z test_custom_stream (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1788639Z test_custom_stream_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1789113Z test_custom_stream_nested (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1789577Z test_custom_stream_nested_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1790048Z test_device_map_cpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1790538Z test_device_map_cpu_to_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1791027Z test_device_map_cpu_to_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1791526Z test_device_map_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1792030Z test_device_map_gpu_default_to_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1792525Z test_device_map_gpu_mixed_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1792981Z test_device_map_gpu_mixed_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1793460Z test_device_map_gpu_mixed_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1793930Z test_device_map_gpu_mixed_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1794377Z test_device_map_gpu_mixed_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1794854Z test_device_map_gpu_mixed_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1795324Z test_device_map_gpu_mixed_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1795775Z test_device_map_gpu_mixed_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1796253Z test_device_map_gpu_mixed_self_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1796744Z test_device_map_gpu_mixed_self_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1797305Z test_device_map_gpu_mixed_self_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1797781Z test_device_map_gpu_mixed_self_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1798256Z test_device_map_gpu_mixed_self_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1798734Z test_device_map_gpu_mixed_self_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1799198Z test_device_map_gpu_mixed_self_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1799683Z test_device_map_gpu_mixed_self_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1800169Z test_device_map_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1800677Z test_device_map_gpu_non_default_to_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1801173Z test_device_map_gpu_to_cpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1801679Z test_device_map_gpu_to_cpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1802166Z test_device_maps_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1802622Z test_device_maps_in_options (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1803128Z test_device_maps_invalid_max_local_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1803649Z test_device_maps_invalid_max_remote_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1804229Z test_device_maps_invalid_min_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1804702Z test_device_maps_many_to_one (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1805229Z test_device_maps_missing_config (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1805733Z test_device_maps_missing_config_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1806232Z test_device_maps_missing_config_not_timeout (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1806754Z test_device_maps_missing_config_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1807274Z test_device_maps_missing_config_remote_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1807802Z test_device_maps_missing_config_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1808310Z test_device_maps_missing_config_response_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1808817Z test_device_maps_multi_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1809302Z test_device_maps_multi_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1809787Z test_device_maps_one_to_many (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1810245Z test_device_maps_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1810722Z test_device_maps_return_to_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1811223Z test_device_maps_return_to_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1811703Z test_device_maps_wrong_worker_name (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1812184Z test_device_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1812659Z test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1813142Z test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1813632Z test_meta_multiple_tensors (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1814136Z test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1814657Z test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1815158Z test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1815730Z test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1816247Z test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1816724Z test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1817217Z test_rref_as_arg_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1817711Z test_rref_as_arg_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1818201Z test_rref_as_arg_synchronization5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1818679Z test_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1819187Z test_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1819690Z test_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1820192Z test_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1820675Z test_rref_to_here_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1821177Z test_rref_to_here_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1821678Z test_rref_to_here_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1822218Z test_rref_to_here_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1822724Z test_rref_with_unpickleable_attributes (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1823227Z test_tensor_view_as_return_value (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:04:56.1824960Z , <__main__.TensorPipeTensorPipeCudaDistAutogradTest testMethod=test_dist_autograd_sync_streams>, <__main__.TensorPipeTensorPipeCudaDistAutogradTest testMethod=test_gradients_synchronizations>]> 2022-05-18T04:04:56.1825846Z test_device_maps_backward_pass (__main__.TensorPipeTensorPipeCudaDistAutogradTest) 2022-05-18T04:04:56.1826367Z test_dist_autograd_sync_streams (__main__.TensorPipeTensorPipeCudaDistAutogradTest) 2022-05-18T04:04:56.1826895Z test_gradients_synchronizations (__main__.TensorPipeTensorPipeCudaDistAutogradTest) 2022-05-18T04:04:57.0636838Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmchhhabj 2022-05-18T04:04:57.0637787Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmchhhabj/_remote_module_non_scriptable.py 2022-05-18T04:04:57.4811564Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:04:57.4826661Z 2022-05-18T04:04:57.4826990Z Running tests... 2022-05-18T04:04:57.4827424Z ---------------------------------------------------------------------- 2022-05-18T04:04:59.0847364Z test_ddp_dist_autograd_local_vs_remote_gpu (__main__.TensorPipeCudaDdpComparisonTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:04:59.1218312Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 578 2022-05-18T04:04:59.1318692Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 579 2022-05-18T04:04:59.1419516Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 580 2022-05-18T04:04:59.1520814Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 581 2022-05-18T04:05:00.0338093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz55x3n8p 2022-05-18T04:05:00.0338768Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz55x3n8p/_remote_module_non_scriptable.py 2022-05-18T04:05:00.0341954Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1emq3v44 2022-05-18T04:05:00.0345333Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1emq3v44/_remote_module_non_scriptable.py 2022-05-18T04:05:00.0378976Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph_ij2jvk 2022-05-18T04:05:00.0381721Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph_ij2jvk/_remote_module_non_scriptable.py 2022-05-18T04:05:00.0391513Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpor3oi1nv 2022-05-18T04:05:00.0394231Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpor3oi1nv/_remote_module_non_scriptable.py 2022-05-18T04:05:00.4435644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:05:00.4436198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:00.4471354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:05:00.4550958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:00.6574709Z skip: Need at least 4 CUDA devices (3.174s) 2022-05-18T04:05:00.6574964Z 2022-05-18T04:05:00.6575335Z ---------------------------------------------------------------------- 2022-05-18T04:05:00.6575685Z Ran 1 test in 3.175s 2022-05-18T04:05:00.6575852Z 2022-05-18T04:05:00.6575962Z OK (skipped=1) 2022-05-18T04:05:00.6576122Z 2022-05-18T04:05:00.6576257Z Generating XML reports... 2022-05-18T04:05:00.6621755Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDdpComparisonTest-20220518040457.xml 2022-05-18T04:05:01.8552640Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgbi9cfwa 2022-05-18T04:05:01.8553602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgbi9cfwa/_remote_module_non_scriptable.py 2022-05-18T04:05:02.2702067Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:02.2717088Z 2022-05-18T04:05:02.2717460Z Running tests... 2022-05-18T04:05:02.2717959Z ---------------------------------------------------------------------- 2022-05-18T04:05:03.8484655Z test_gpu_simple (__main__.TensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:03.8850973Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 749 2022-05-18T04:05:03.8951423Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 750 2022-05-18T04:05:03.9050163Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 751 2022-05-18T04:05:03.9151289Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 752 2022-05-18T04:05:04.7998645Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz3_2o7z3 2022-05-18T04:05:04.7999671Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz3_2o7z3/_remote_module_non_scriptable.py 2022-05-18T04:05:04.8675904Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu4vztd7z 2022-05-18T04:05:04.8677122Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu4vztd7z/_remote_module_non_scriptable.py 2022-05-18T04:05:04.8695006Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5effd62k 2022-05-18T04:05:04.8697963Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5effd62k/_remote_module_non_scriptable.py 2022-05-18T04:05:04.8750966Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7pepomlh 2022-05-18T04:05:04.8753533Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7pepomlh/_remote_module_non_scriptable.py 2022-05-18T04:05:05.2027855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:05.2733864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:05:05.2801006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:05.2851208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:05:07.4254005Z ok (5.153s) 2022-05-18T04:05:07.4254274Z 2022-05-18T04:05:07.4254877Z ---------------------------------------------------------------------- 2022-05-18T04:05:07.4255224Z Ran 1 test in 5.154s 2022-05-18T04:05:07.4255393Z 2022-05-18T04:05:07.4255468Z OK 2022-05-18T04:05:07.4255610Z 2022-05-18T04:05:07.4255773Z Generating XML reports... 2022-05-18T04:05:07.4298389Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518040502.xml 2022-05-18T04:05:08.5992745Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqzw9i1cd 2022-05-18T04:05:08.5993858Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqzw9i1cd/_remote_module_non_scriptable.py 2022-05-18T04:05:09.0085165Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:09.0099290Z 2022-05-18T04:05:09.0099741Z Running tests... 2022-05-18T04:05:09.0100239Z ---------------------------------------------------------------------- 2022-05-18T04:05:10.5837117Z test_gpu_to_cpu_continuation (__main__.TensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:10.6215741Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1168 2022-05-18T04:05:10.6316863Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1169 2022-05-18T04:05:10.6419588Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1170 2022-05-18T04:05:10.6524856Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1171 2022-05-18T04:05:11.5352633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnc8rsefo 2022-05-18T04:05:11.5353478Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnc8rsefo/_remote_module_non_scriptable.py 2022-05-18T04:05:11.5974987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplv69q4ly 2022-05-18T04:05:11.5976130Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplv69q4ly/_remote_module_non_scriptable.py 2022-05-18T04:05:11.6002093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyfrl8kha 2022-05-18T04:05:11.6005186Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyfrl8kha/_remote_module_non_scriptable.py 2022-05-18T04:05:11.6209858Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1li3lhev 2022-05-18T04:05:11.6212074Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1li3lhev/_remote_module_non_scriptable.py 2022-05-18T04:05:11.9520521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:11.9964149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:12.0127842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:05:12.0252229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:05:14.2640276Z ok (5.254s) 2022-05-18T04:05:14.2640490Z 2022-05-18T04:05:14.2640910Z ---------------------------------------------------------------------- 2022-05-18T04:05:14.2641255Z Ran 1 test in 5.254s 2022-05-18T04:05:14.2641424Z 2022-05-18T04:05:14.2641517Z OK 2022-05-18T04:05:14.2641653Z 2022-05-18T04:05:14.2641791Z Generating XML reports... 2022-05-18T04:05:14.2684962Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518040509.xml 2022-05-18T04:05:15.4454211Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa674hu5b 2022-05-18T04:05:15.4455496Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa674hu5b/_remote_module_non_scriptable.py 2022-05-18T04:05:15.8557012Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:15.8571687Z 2022-05-18T04:05:15.8572106Z Running tests... 2022-05-18T04:05:15.8572610Z ---------------------------------------------------------------------- 2022-05-18T04:05:17.4449228Z test_gpu_to_cpu_continuation_gpu_root (__main__.TensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:17.4829335Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1587 2022-05-18T04:05:17.4929551Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1588 2022-05-18T04:05:17.5032665Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1589 2022-05-18T04:05:17.5135591Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1590 2022-05-18T04:05:18.3861822Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4fuhz_cd 2022-05-18T04:05:18.3862456Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4fuhz_cd/_remote_module_non_scriptable.py 2022-05-18T04:05:18.3876970Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwmimvjt2 2022-05-18T04:05:18.3879709Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwmimvjt2/_remote_module_non_scriptable.py 2022-05-18T04:05:18.3936894Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8dpn9w36 2022-05-18T04:05:18.3940036Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8dpn9w36/_remote_module_non_scriptable.py 2022-05-18T04:05:18.4197510Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9hhetmej 2022-05-18T04:05:18.4199925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9hhetmej/_remote_module_non_scriptable.py 2022-05-18T04:05:18.7940189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:05:18.7954758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:18.7975791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:05:18.8226316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:21.0238499Z ok (5.166s) 2022-05-18T04:05:21.0238733Z 2022-05-18T04:05:21.0239149Z ---------------------------------------------------------------------- 2022-05-18T04:05:21.0239521Z Ran 1 test in 5.167s 2022-05-18T04:05:21.0239673Z 2022-05-18T04:05:21.0239766Z OK 2022-05-18T04:05:21.0239905Z 2022-05-18T04:05:21.0240040Z Generating XML reports... 2022-05-18T04:05:21.0282058Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518040515.xml 2022-05-18T04:05:22.1792862Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy5bp1ncw 2022-05-18T04:05:22.1793943Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy5bp1ncw/_remote_module_non_scriptable.py 2022-05-18T04:05:22.5793663Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:22.5807719Z 2022-05-18T04:05:22.5807954Z Running tests... 2022-05-18T04:05:22.5808538Z ---------------------------------------------------------------------- 2022-05-18T04:05:24.1348492Z test_input_moved_to_cuda_device (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:24.1721827Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2006 2022-05-18T04:05:24.1821737Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2007 2022-05-18T04:05:25.0651519Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx12bbprt 2022-05-18T04:05:25.0652376Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx12bbprt/_remote_module_non_scriptable.py 2022-05-18T04:05:25.0662292Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpynmzs0e2 2022-05-18T04:05:25.0665542Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpynmzs0e2/_remote_module_non_scriptable.py 2022-05-18T04:05:25.4650632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:25.4791956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:27.1911247Z ok (4.610s) 2022-05-18T04:05:27.1911473Z 2022-05-18T04:05:27.1911891Z ---------------------------------------------------------------------- 2022-05-18T04:05:27.1912234Z Ran 1 test in 4.610s 2022-05-18T04:05:27.1912406Z 2022-05-18T04:05:27.1912483Z OK 2022-05-18T04:05:27.1912618Z 2022-05-18T04:05:27.1912757Z Generating XML reports... 2022-05-18T04:05:27.1955567Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040522.xml 2022-05-18T04:05:28.3628861Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeqtm_p3i 2022-05-18T04:05:28.3630075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeqtm_p3i/_remote_module_non_scriptable.py 2022-05-18T04:05:28.7758706Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:28.7773135Z 2022-05-18T04:05:28.7773563Z Running tests... 2022-05-18T04:05:28.7773999Z ---------------------------------------------------------------------- 2022-05-18T04:05:30.3611682Z test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:30.3997144Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2194 2022-05-18T04:05:30.4099052Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2195 2022-05-18T04:05:31.3255826Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxblk223b 2022-05-18T04:05:31.3257013Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxblk223b/_remote_module_non_scriptable.py 2022-05-18T04:05:31.3897412Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmygkbjrs 2022-05-18T04:05:31.3899267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmygkbjrs/_remote_module_non_scriptable.py 2022-05-18T04:05:31.7210202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:31.8007907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:31.9597584Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxblk223b/_remote_module___torch___torch_testing__internal_distributed_nn_api_remote_module_test_MyModuleInterface.py 2022-05-18T04:05:31.9598424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmygkbjrs/_remote_module___torch___torch_testing__internal_distributed_nn_api_remote_module_test_MyModuleInterface.py 2022-05-18T04:05:31.9678489Z INFO:torch.distributed.nn.jit.instantiator:Skipped writing /tmp/tmpmygkbjrs/_remote_module___torch___torch_testing__internal_distributed_nn_api_remote_module_test_MyModuleInterface.py 2022-05-18T04:05:33.7190214Z ok (4.941s) 2022-05-18T04:05:33.7190434Z 2022-05-18T04:05:33.7190844Z ---------------------------------------------------------------------- 2022-05-18T04:05:33.7191195Z Ran 1 test in 4.942s 2022-05-18T04:05:33.7191362Z 2022-05-18T04:05:33.7191456Z OK 2022-05-18T04:05:33.7191589Z 2022-05-18T04:05:33.7191724Z Generating XML reports... 2022-05-18T04:05:33.7234710Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040528.xml 2022-05-18T04:05:34.9014216Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7a87l4wz 2022-05-18T04:05:34.9015600Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7a87l4wz/_remote_module_non_scriptable.py 2022-05-18T04:05:35.3097356Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:35.3112040Z 2022-05-18T04:05:35.3112522Z Running tests... 2022-05-18T04:05:35.3113031Z ---------------------------------------------------------------------- 2022-05-18T04:05:36.8925254Z test_invalid_devices (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:36.9305831Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2398 2022-05-18T04:05:36.9407022Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2399 2022-05-18T04:05:37.8183585Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3rluff3n 2022-05-18T04:05:37.8185270Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3rluff3n/_remote_module_non_scriptable.py 2022-05-18T04:05:37.8399855Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnce7qzlk 2022-05-18T04:05:37.8402501Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnce7qzlk/_remote_module_non_scriptable.py 2022-05-18T04:05:38.2281459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:38.2378012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:38.4339807Z On WorkerInfo(id=1, name=worker1): 2022-05-18T04:05:38.4364844Z RuntimeError('CUDA error: invalid device ordinal\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nException raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f054b8b31bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: + 0x146b4 (0x7f054bb056b4 in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)\nframe #2: + 0xd8821d (0x7f054cacc21d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #3: + 0x2c59814 (0x7f054e99d814 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #4: + 0x2c598fb (0x7f054e99d8fb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7f05578f751f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x1a8f8b5 (0x7f0557b588b5 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7f0557936314 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7f05573516ea in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x1c29a63 (0x7f0557cf2a63 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: + 0x1a920d1 (0x7f0557b5b0d1 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #13: + 0x2a525ce (0x7f0558b1b5ce in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #14: + 0x2a52b4b (0x7f0558b1bb4b in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7f055772d5d2 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7f05573488de in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #17: + 0x1d1aa99 (0x7f0557de3a99 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7f0557842676 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #19: + 0x3224b0 (0x7f056197d4b0 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #20: + 0x322965 (0x7f056197d965 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #21: + 0x1bfb9c (0x558c55211b9c in /opt/conda/bin/python)\nframe #22: + 0xff72f (0x558c5515172f in /opt/conda/bin/python)\nframe #23: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python)\nframe #24: _PyFunction_Vectorcall + 0x1d4 (0x558c551e9354 in /opt/conda/bin/python)\nframe #25: + 0xfdae6 (0x558c5514fae6 in /opt/conda/bin/python)\nframe #26: + 0x197bf9 (0x558c551e9bf9 in /opt/conda/bin/python)\nframe #27: + 0xff755 (0x558c55151755 in /opt/conda/bin/python)\nframe #28: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python)\nframe #29: + 0x197ca4 (0x558c551e9ca4 in /opt/conda/bin/python)\nframe #30: + 0xff755 (0x558c55151755 in /opt/conda/bin/python)\nframe #31: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python)\nframe #32: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python)\nframe #33: _PyEval_EvalFrameDefault + 0x2610 (0x558c552299f0 in /opt/conda/bin/python)\nframe #34: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python)\nframe #35: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python)\nframe #36: + 0x94774a (0x7f0561fa274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #37: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0561fa0a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #38: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0561fa3b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #39: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7f0561fa41e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #40: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f0559db0b44 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #41: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0561fa3915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #42: + 0x3ce0e43 (0x7f0559da9e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #43: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0559daaa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #44: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0559da50b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #45: + 0x3d10b42 (0x7f0559dd9b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #46: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f054b8a15eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #47: + 0xc9039 (0x7f0564fe3039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #48: + 0x76db (0x7f059a5e86db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #49: clone + 0x3f (0x7f059a31161f in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:05:38.4374508Z Traceback (most recent call last): 2022-05-18T04:05:38.4375067Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:05:38.4375532Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:05:38.4376117Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/nn/api/remote_module.py", line 89, in _create_module 2022-05-18T04:05:38.4376493Z module.to(device) 2022-05-18T04:05:38.4376951Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in to 2022-05-18T04:05:38.4377322Z return self._apply(convert) 2022-05-18T04:05:38.4377811Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply 2022-05-18T04:05:38.4378168Z param_applied = fn(param) 2022-05-18T04:05:38.4378649Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 925, in convert 2022-05-18T04:05:38.4379119Z return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) 2022-05-18T04:05:38.4379500Z RuntimeError: CUDA error: invalid device ordinal 2022-05-18T04:05:38.4379956Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. 2022-05-18T04:05:38.4380416Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-05-18T04:05:38.4380895Z Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first): 2022-05-18T04:05:38.4381733Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f054b8b31bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:05:38.4382468Z frame #1: + 0x146b4 (0x7f054bb056b4 in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) 2022-05-18T04:05:38.4383192Z frame #2: + 0xd8821d (0x7f054cacc21d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4384347Z frame #3: + 0x2c59814 (0x7f054e99d814 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4385102Z frame #4: + 0x2c598fb (0x7f054e99d8fb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4386112Z frame #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7f05578f751f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4386951Z frame #6: + 0x1a8f8b5 (0x7f0557b588b5 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4387858Z frame #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7f0557936314 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4388954Z frame #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7f05573516ea in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4389753Z frame #9: + 0x1c29a63 (0x7f0557cf2a63 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4390843Z frame #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4391678Z frame #11: + 0x1a920d1 (0x7f0557b5b0d1 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4392688Z frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4393515Z frame #13: + 0x2a525ce (0x7f0558b1b5ce in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4394157Z frame #14: + 0x2a52b4b (0x7f0558b1bb4b in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4395106Z frame #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7f055772d5d2 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4396203Z frame #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7f05573488de in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4397008Z frame #17: + 0x1d1aa99 (0x7f0557de3a99 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4397964Z frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7f0557842676 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4398790Z frame #19: + 0x3224b0 (0x7f056197d4b0 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4399429Z frame #20: + 0x322965 (0x7f056197d965 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4399947Z frame #21: + 0x1bfb9c (0x558c55211b9c in /opt/conda/bin/python) 2022-05-18T04:05:38.4400343Z frame #22: + 0xff72f (0x558c5515172f in /opt/conda/bin/python) 2022-05-18T04:05:38.4400739Z frame #23: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python) 2022-05-18T04:05:38.4401141Z frame #24: _PyFunction_Vectorcall + 0x1d4 (0x558c551e9354 in /opt/conda/bin/python) 2022-05-18T04:05:38.4401556Z frame #25: + 0xfdae6 (0x558c5514fae6 in /opt/conda/bin/python) 2022-05-18T04:05:38.4401939Z frame #26: + 0x197bf9 (0x558c551e9bf9 in /opt/conda/bin/python) 2022-05-18T04:05:38.4402337Z frame #27: + 0xff755 (0x558c55151755 in /opt/conda/bin/python) 2022-05-18T04:05:38.4402727Z frame #28: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python) 2022-05-18T04:05:38.4403098Z frame #29: + 0x197ca4 (0x558c551e9ca4 in /opt/conda/bin/python) 2022-05-18T04:05:38.4403499Z frame #30: + 0xff755 (0x558c55151755 in /opt/conda/bin/python) 2022-05-18T04:05:38.4403900Z frame #31: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python) 2022-05-18T04:05:38.4404301Z frame #32: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python) 2022-05-18T04:05:38.4404693Z frame #33: _PyEval_EvalFrameDefault + 0x2610 (0x558c552299f0 in /opt/conda/bin/python) 2022-05-18T04:05:38.4405176Z frame #34: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python) 2022-05-18T04:05:38.4405572Z frame #35: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python) 2022-05-18T04:05:38.4406150Z frame #36: + 0x94774a (0x7f0561fa274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4406939Z frame #37: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0561fa0a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4407949Z frame #38: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0561fa3b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4409071Z frame #39: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7f0561fa41e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4410290Z frame #40: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f0559db0b44 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4411548Z frame #41: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0561fa3915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4412420Z frame #42: + 0x3ce0e43 (0x7f0559da9e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4413359Z frame #43: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0559daaa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4414412Z frame #44: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0559da50b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4415249Z frame #45: + 0x3d10b42 (0x7f0559dd9b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4415944Z frame #46: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f054b8a15eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:05:38.4416427Z frame #47: + 0xc9039 (0x7f0564fe3039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:05:38.4416978Z frame #48: + 0x76db (0x7f059a5e86db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:05:38.4417476Z frame #49: clone + 0x3f (0x7f059a31161f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:05:38.4417703Z 2022-05-18T04:05:38.4417722Z 2022-05-18T04:05:38.4417861Z On WorkerInfo(id=1, name=worker1): 2022-05-18T04:05:38.4453010Z RuntimeError('On WorkerInfo(id=1, name=worker1):\nRuntimeError(\'CUDA error: invalid device ordinal\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nException raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f054b8b31bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: + 0x146b4 (0x7f054bb056b4 in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)\nframe #2: + 0xd8821d (0x7f054cacc21d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #3: + 0x2c59814 (0x7f054e99d814 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #4: + 0x2c598fb (0x7f054e99d8fb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7f05578f751f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x1a8f8b5 (0x7f0557b588b5 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7f0557936314 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7f05573516ea in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x1c29a63 (0x7f0557cf2a63 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: + 0x1a920d1 (0x7f0557b5b0d1 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #13: + 0x2a525ce (0x7f0558b1b5ce in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #14: + 0x2a52b4b (0x7f0558b1bb4b in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7f055772d5d2 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7f05573488de in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #17: + 0x1d1aa99 (0x7f0557de3a99 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7f0557842676 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #19: + 0x3224b0 (0x7f056197d4b0 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #20: + 0x322965 (0x7f056197d965 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #21: + 0x1bfb9c (0x558c55211b9c in /opt/conda/bin/python)\nframe #22: + 0xff72f (0x558c5515172f in /opt/conda/bin/python)\nframe #23: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python)\nframe #24: _PyFunction_Vectorcall + 0x1d4 (0x558c551e9354 in /opt/conda/bin/python)\nframe #25: + 0xfdae6 (0x558c5514fae6 in /opt/conda/bin/python)\nframe #26: + 0x197bf9 (0x558c551e9bf9 in /opt/conda/bin/python)\nframe #27: + 0xff755 (0x558c55151755 in /opt/conda/bin/python)\nframe #28: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python)\nframe #29: + 0x197ca4 (0x558c551e9ca4 in /opt/conda/bin/python)\nframe #30: + 0xff755 (0x558c55151755 in /opt/conda/bin/python)\nframe #31: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python)\nframe #32: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python)\nframe #33: _PyEval_EvalFrameDefault + 0x2610 (0x558c552299f0 in /opt/conda/bin/python)\nframe #34: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python)\nframe #35: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python)\nframe #36: + 0x94774a (0x7f0561fa274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #37: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0561fa0a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #38: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0561fa3b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #39: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7f0561fa41e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #40: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f0559db0b44 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #41: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0561fa3915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #42: + 0x3ce0e43 (0x7f0559da9e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #43: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0559daaa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #44: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0559da50b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #45: + 0x3d10b42 (0x7f0559dd9b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #46: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f054b8a15eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #47: + 0xc9039 (0x7f0564fe3039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #48: + 0x76db (0x7f059a5e86db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #49: clone + 0x3f (0x7f059a31161f in /lib/x86_64-linux-gnu/libc.so.6)\n\')\nTraceback (most recent call last):\n File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/opt/conda/lib/python3.9/site-packages/torch/distributed/nn/api/remote_module.py", line 89, in _create_module\n module.to(device)\n File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in to\n return self._apply(convert)\n File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply\n param_applied = fn(param)\n File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 925, in convert\n return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)\nRuntimeError: CUDA error: invalid device ordinal\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nException raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f054b8b31bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: + 0x146b4 (0x7f054bb056b4 in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)\nframe #2: + 0xd8821d (0x7f054cacc21d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #3: + 0x2c59814 (0x7f054e99d814 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #4: + 0x2c598fb (0x7f054e99d8fb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7f05578f751f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x1a8f8b5 (0x7f0557b588b5 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7f0557936314 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7f05573516ea in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x1c29a63 (0x7f0557cf2a63 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: + 0x1a920d1 (0x7f0557b5b0d1 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #13: + 0x2a525ce (0x7f0558b1b5ce in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #14: + 0x2a52b4b (0x7f0558b1bb4b in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7f055772d5d2 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7f05573488de in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #17: + 0x1d1aa99 (0x7f0557de3a99 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7f0557842676 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #19: + 0x3224b0 (0x7f056197d4b0 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #20: + 0x322965 (0x7f056197d965 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #21: + 0x1bfb9c (0x558c55211b9c in /opt/conda/bin/python)\nframe #22: + 0xff72f (0x558c5515172f in /opt/conda/bin/python)\nframe #23: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python)\nframe #24: _PyFunction_Vectorcall + 0x1d4 (0x558c551e9354 in /opt/conda/bin/python)\nframe #25: + 0xfdae6 (0x558c5514fae6 in /opt/conda/bin/python)\nframe #26: + 0x197bf9 (0x558c551e9bf9 in /opt/conda/bin/python)\nframe #27: + 0xff755 (0x558c55151755 in /opt/conda/bin/python)\nframe #28: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python)\nframe #29: + 0x197ca4 (0x558c551e9ca4 in /opt/conda/bin/python)\nframe #30: + 0xff755 (0x558c55151755 in /opt/conda/bin/python)\nframe #31: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python)\nframe #32: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python)\nframe #33: _PyEval_EvalFrameDefault + 0x2610 (0x558c552299f0 in /opt/conda/bin/python)\nframe #34: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python)\nframe #35: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python)\nframe #36: + 0x94774a (0x7f0561fa274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #37: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0561fa0a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #38: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0561fa3b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #39: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7f0561fa41e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #40: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f0559db0b44 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #41: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0561fa3915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #42: + 0x3ce0e43 (0x7f0559da9e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #43: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0559daaa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #44: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0559da50b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #45: + 0x3d10b42 (0x7f0559dd9b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #46: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f054b8a15eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #47: + 0xc9039 (0x7f0564fe3039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #48: + 0x76db (0x7f059a5e86db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #49: clone + 0x3f (0x7f059a31161f in /lib/x86_64-linux-gnu/libc.so.6)\n\n') 2022-05-18T04:05:38.4473839Z Traceback (most recent call last): 2022-05-18T04:05:38.4474385Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:05:38.4474851Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:05:38.4475262Z File "/tmp/tmp7a87l4wz/_remote_module_non_scriptable.py", line 47, in _remote_forward 2022-05-18T04:05:38.4475635Z module = module_rref.local_value() 2022-05-18T04:05:38.4476168Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 220, in _handle_exception 2022-05-18T04:05:38.4476743Z raise result.exception_type(result.msg.encode("utf-8").decode("unicode_escape")) 2022-05-18T04:05:38.4477125Z RuntimeError: On WorkerInfo(id=1, name=worker1): 2022-05-18T04:05:38.4477527Z RuntimeError('CUDA error: invalid device ordinal 2022-05-18T04:05:38.4477980Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. 2022-05-18T04:05:38.4478413Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-05-18T04:05:38.4478890Z Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first): 2022-05-18T04:05:38.4479734Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f054b8b31bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:05:38.4480460Z frame #1: + 0x146b4 (0x7f054bb056b4 in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) 2022-05-18T04:05:38.4481079Z frame #2: + 0xd8821d (0x7f054cacc21d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4481810Z frame #3: + 0x2c59814 (0x7f054e99d814 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4482460Z frame #4: + 0x2c598fb (0x7f054e99d8fb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4483451Z frame #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7f05578f751f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4484319Z frame #6: + 0x1a8f8b5 (0x7f0557b588b5 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4485231Z frame #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7f0557936314 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4486323Z frame #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7f05573516ea in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4487196Z frame #9: + 0x1c29a63 (0x7f0557cf2a63 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4488210Z frame #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4489051Z frame #11: + 0x1a920d1 (0x7f0557b5b0d1 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4490048Z frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4490892Z frame #13: + 0x2a525ce (0x7f0558b1b5ce in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4491534Z frame #14: + 0x2a52b4b (0x7f0558b1bb4b in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4492491Z frame #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7f055772d5d2 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4493591Z frame #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7f05573488de in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4494394Z frame #17: + 0x1d1aa99 (0x7f0557de3a99 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4495353Z frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7f0557842676 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4496173Z frame #19: + 0x3224b0 (0x7f056197d4b0 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4496889Z frame #20: + 0x322965 (0x7f056197d965 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4497358Z frame #21: + 0x1bfb9c (0x558c55211b9c in /opt/conda/bin/python) 2022-05-18T04:05:38.4497744Z frame #22: + 0xff72f (0x558c5515172f in /opt/conda/bin/python) 2022-05-18T04:05:38.4498145Z frame #23: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python) 2022-05-18T04:05:38.4498552Z frame #24: _PyFunction_Vectorcall + 0x1d4 (0x558c551e9354 in /opt/conda/bin/python) 2022-05-18T04:05:38.4498945Z frame #25: + 0xfdae6 (0x558c5514fae6 in /opt/conda/bin/python) 2022-05-18T04:05:38.4499348Z frame #26: + 0x197bf9 (0x558c551e9bf9 in /opt/conda/bin/python) 2022-05-18T04:05:38.4499742Z frame #27: + 0xff755 (0x558c55151755 in /opt/conda/bin/python) 2022-05-18T04:05:38.4500126Z frame #28: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python) 2022-05-18T04:05:38.4500505Z frame #29: + 0x197ca4 (0x558c551e9ca4 in /opt/conda/bin/python) 2022-05-18T04:05:38.4500898Z frame #30: + 0xff755 (0x558c55151755 in /opt/conda/bin/python) 2022-05-18T04:05:38.4501296Z frame #31: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python) 2022-05-18T04:05:38.4501678Z frame #32: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python) 2022-05-18T04:05:38.4502149Z frame #33: _PyEval_EvalFrameDefault + 0x2610 (0x558c552299f0 in /opt/conda/bin/python) 2022-05-18T04:05:38.4502566Z frame #34: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python) 2022-05-18T04:05:38.4502960Z frame #35: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python) 2022-05-18T04:05:38.4503538Z frame #36: + 0x94774a (0x7f0561fa274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4504754Z frame #37: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0561fa0a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4505762Z frame #38: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0561fa3b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4506893Z frame #39: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7f0561fa41e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4508103Z frame #40: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f0559db0b44 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4509370Z frame #41: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0561fa3915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4510232Z frame #42: + 0x3ce0e43 (0x7f0559da9e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4511169Z frame #43: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0559daaa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4512303Z frame #44: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0559da50b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4513096Z frame #45: + 0x3d10b42 (0x7f0559dd9b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4513778Z frame #46: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f054b8a15eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:05:38.4514266Z frame #47: + 0xc9039 (0x7f0564fe3039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:05:38.4514816Z frame #48: + 0x76db (0x7f059a5e86db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:05:38.4515314Z frame #49: clone + 0x3f (0x7f059a31161f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:05:38.4515616Z ') 2022-05-18T04:05:38.4515870Z Traceback (most recent call last): 2022-05-18T04:05:38.4516393Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:05:38.4516852Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:05:38.4517407Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/nn/api/remote_module.py", line 89, in _create_module 2022-05-18T04:05:38.4517789Z module.to(device) 2022-05-18T04:05:38.4518236Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in to 2022-05-18T04:05:38.4518662Z return self._apply(convert) 2022-05-18T04:05:38.4519141Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply 2022-05-18T04:05:38.4519512Z param_applied = fn(param) 2022-05-18T04:05:38.4519968Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 925, in convert 2022-05-18T04:05:38.4520440Z return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) 2022-05-18T04:05:38.4520845Z RuntimeError: CUDA error: invalid device ordinal 2022-05-18T04:05:38.4521299Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. 2022-05-18T04:05:38.4521740Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-05-18T04:05:38.4522213Z Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first): 2022-05-18T04:05:38.4523051Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f054b8b31bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:05:38.4523776Z frame #1: + 0x146b4 (0x7f054bb056b4 in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) 2022-05-18T04:05:38.4524392Z frame #2: + 0xd8821d (0x7f054cacc21d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4525032Z frame #3: + 0x2c59814 (0x7f054e99d814 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4525665Z frame #4: + 0x2c598fb (0x7f054e99d8fb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:05:38.4526661Z frame #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7f05578f751f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4527471Z frame #6: + 0x1a8f8b5 (0x7f0557b588b5 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4528423Z frame #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7f0557936314 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4529568Z frame #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7f05573516ea in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4530376Z frame #9: + 0x1c29a63 (0x7f0557cf2a63 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4531391Z frame #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4532229Z frame #11: + 0x1a920d1 (0x7f0557b5b0d1 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4533224Z frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7f05576b863d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4534067Z frame #13: + 0x2a525ce (0x7f0558b1b5ce in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4534772Z frame #14: + 0x2a52b4b (0x7f0558b1bb4b in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4535724Z frame #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7f055772d5d2 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4536831Z frame #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7f05573488de in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4537616Z frame #17: + 0x1d1aa99 (0x7f0557de3a99 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4538593Z frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7f0557842676 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4539413Z frame #19: + 0x3224b0 (0x7f056197d4b0 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4540047Z frame #20: + 0x322965 (0x7f056197d965 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4540511Z frame #21: + 0x1bfb9c (0x558c55211b9c in /opt/conda/bin/python) 2022-05-18T04:05:38.4540897Z frame #22: + 0xff72f (0x558c5515172f in /opt/conda/bin/python) 2022-05-18T04:05:38.4541295Z frame #23: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python) 2022-05-18T04:05:38.4541698Z frame #24: _PyFunction_Vectorcall + 0x1d4 (0x558c551e9354 in /opt/conda/bin/python) 2022-05-18T04:05:38.4542094Z frame #25: + 0xfdae6 (0x558c5514fae6 in /opt/conda/bin/python) 2022-05-18T04:05:38.4542497Z frame #26: + 0x197bf9 (0x558c551e9bf9 in /opt/conda/bin/python) 2022-05-18T04:05:38.4542889Z frame #27: + 0xff755 (0x558c55151755 in /opt/conda/bin/python) 2022-05-18T04:05:38.4543274Z frame #28: + 0x196663 (0x558c551e8663 in /opt/conda/bin/python) 2022-05-18T04:05:38.4544270Z frame #29: + 0x197ca4 (0x558c551e9ca4 in /opt/conda/bin/python) 2022-05-18T04:05:38.4544758Z frame #30: + 0xff755 (0x558c55151755 in /opt/conda/bin/python) 2022-05-18T04:05:38.4545175Z frame #31: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python) 2022-05-18T04:05:38.4545560Z frame #32: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python) 2022-05-18T04:05:38.4545971Z frame #33: _PyEval_EvalFrameDefault + 0x2610 (0x558c552299f0 in /opt/conda/bin/python) 2022-05-18T04:05:38.4546391Z frame #34: _PyFunction_Vectorcall + 0x104 (0x558c551e9284 in /opt/conda/bin/python) 2022-05-18T04:05:38.4546786Z frame #35: _PyObject_Call + 0x1da (0x558c55197a7a in /opt/conda/bin/python) 2022-05-18T04:05:38.4547369Z frame #36: + 0x94774a (0x7f0561fa274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4548162Z frame #37: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0561fa0a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4549165Z frame #38: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0561fa3b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4550294Z frame #39: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7f0561fa41e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4551579Z frame #40: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f0559db0b44 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4552845Z frame #41: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0561fa3915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:05:38.4553715Z frame #42: + 0x3ce0e43 (0x7f0559da9e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4554646Z frame #43: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0559daaa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4555704Z frame #44: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0559da50b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4556483Z frame #45: + 0x3d10b42 (0x7f0559dd9b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:05:38.4557162Z frame #46: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f054b8a15eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:05:38.4557641Z frame #47: + 0xc9039 (0x7f0564fe3039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:05:38.4558189Z frame #48: + 0x76db (0x7f059a5e86db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:05:38.4558689Z frame #49: clone + 0x3f (0x7f059a31161f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:05:38.4558913Z 2022-05-18T04:05:38.4558933Z 2022-05-18T04:05:38.4558953Z 2022-05-18T04:05:38.6461580Z ok (3.335s) 2022-05-18T04:05:38.6461768Z 2022-05-18T04:05:38.6462171Z ---------------------------------------------------------------------- 2022-05-18T04:05:38.6462513Z Ran 1 test in 3.335s 2022-05-18T04:05:38.6462683Z 2022-05-18T04:05:38.6462780Z OK 2022-05-18T04:05:38.6462916Z 2022-05-18T04:05:38.6463339Z Generating XML reports... 2022-05-18T04:05:38.6505674Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040535.xml 2022-05-18T04:05:39.8133535Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp27r1tlpx 2022-05-18T04:05:39.8135140Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp27r1tlpx/_remote_module_non_scriptable.py 2022-05-18T04:05:40.2248878Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:40.2263643Z 2022-05-18T04:05:40.2263928Z Running tests... 2022-05-18T04:05:40.2264367Z ---------------------------------------------------------------------- 2022-05-18T04:05:41.7896092Z test_valid_device (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:41.8278234Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2585 2022-05-18T04:05:41.8379163Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2586 2022-05-18T04:05:42.7027835Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpat_99ois 2022-05-18T04:05:42.7029255Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpat_99ois/_remote_module_non_scriptable.py 2022-05-18T04:05:42.7126308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmposi2hxxx 2022-05-18T04:05:42.7129206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmposi2hxxx/_remote_module_non_scriptable.py 2022-05-18T04:05:43.0976554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:43.1202209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:44.8463758Z ok (4.620s) 2022-05-18T04:05:44.8464062Z 2022-05-18T04:05:44.8464762Z ---------------------------------------------------------------------- 2022-05-18T04:05:44.8465117Z Ran 1 test in 4.620s 2022-05-18T04:05:44.8465281Z 2022-05-18T04:05:44.8465378Z OK 2022-05-18T04:05:44.8465513Z 2022-05-18T04:05:44.8465629Z Generating XML reports... 2022-05-18T04:05:44.8507646Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040540.xml 2022-05-18T04:05:46.0116809Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdvhlf80y 2022-05-18T04:05:46.0118051Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdvhlf80y/_remote_module_non_scriptable.py 2022-05-18T04:05:46.4285932Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:46.4300772Z 2022-05-18T04:05:46.4301000Z Running tests... 2022-05-18T04:05:46.4301548Z ---------------------------------------------------------------------- 2022-05-18T04:05:48.0037427Z test_profiler_remote_cuda (__main__.TensorPipeCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:48.0408979Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2773 2022-05-18T04:05:48.0509414Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2774 2022-05-18T04:05:48.0610965Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 2775 2022-05-18T04:05:48.0714654Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 2776 2022-05-18T04:05:48.9662067Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyhhauf72 2022-05-18T04:05:48.9662981Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyhhauf72/_remote_module_non_scriptable.py 2022-05-18T04:05:48.9963173Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzf_qwhb4 2022-05-18T04:05:48.9966061Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzf_qwhb4/_remote_module_non_scriptable.py 2022-05-18T04:05:49.0257059Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe_7da2lz 2022-05-18T04:05:49.0259524Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe_7da2lz/_remote_module_non_scriptable.py 2022-05-18T04:05:49.0355266Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8do5qbgq 2022-05-18T04:05:49.0358642Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8do5qbgq/_remote_module_non_scriptable.py 2022-05-18T04:05:49.3773556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:49.4002461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:49.4226664Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:05:49.4343212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:05:53.6864388Z ok (7.256s) 2022-05-18T04:05:53.6864649Z 2022-05-18T04:05:53.6865097Z ---------------------------------------------------------------------- 2022-05-18T04:05:53.6865439Z Ran 1 test in 7.256s 2022-05-18T04:05:53.6865608Z 2022-05-18T04:05:53.6865710Z OK 2022-05-18T04:05:53.6865825Z 2022-05-18T04:05:53.6865967Z Generating XML reports... 2022-05-18T04:05:53.6909486Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRpcTest-20220518040546.xml 2022-05-18T04:05:54.8331126Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbsgyb2is 2022-05-18T04:05:54.8332139Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbsgyb2is/_remote_module_non_scriptable.py 2022-05-18T04:05:55.2286790Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:55.2300529Z 2022-05-18T04:05:55.2300772Z Running tests... 2022-05-18T04:05:55.2301199Z ---------------------------------------------------------------------- 2022-05-18T04:05:56.7746279Z test_basic_gloo_ckpt_always (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:05:56.8119009Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3116 2022-05-18T04:05:56.8218820Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3117 2022-05-18T04:05:57.7343242Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu5afybf6 2022-05-18T04:05:57.7344079Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu5afybf6/_remote_module_non_scriptable.py 2022-05-18T04:05:57.7471020Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvdlwog5k 2022-05-18T04:05:57.7474278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvdlwog5k/_remote_module_non_scriptable.py 2022-05-18T04:05:58.1349059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:05:58.1579792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:05:58.3267343Z skip: Need at least 4 CUDA devices (3.096s) 2022-05-18T04:05:58.3267573Z 2022-05-18T04:05:58.3267967Z ---------------------------------------------------------------------- 2022-05-18T04:05:58.3268282Z Ran 1 test in 3.097s 2022-05-18T04:05:58.3268454Z 2022-05-18T04:05:58.3268564Z OK (skipped=1) 2022-05-18T04:05:58.3268736Z 2022-05-18T04:05:58.3268864Z Generating XML reports... 2022-05-18T04:05:58.3311215Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040555.xml 2022-05-18T04:05:59.4938286Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6oe_hrkn 2022-05-18T04:05:59.4939758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6oe_hrkn/_remote_module_non_scriptable.py 2022-05-18T04:05:59.9039952Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:05:59.9055240Z 2022-05-18T04:05:59.9055734Z Running tests... 2022-05-18T04:05:59.9056234Z ---------------------------------------------------------------------- 2022-05-18T04:06:01.5118091Z test_basic_gloo_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:01.5506239Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3219 2022-05-18T04:06:01.5610349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3220 2022-05-18T04:06:02.4539493Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy91v279v 2022-05-18T04:06:02.4540445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy91v279v/_remote_module_non_scriptable.py 2022-05-18T04:06:02.4753571Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqvb4wqg5 2022-05-18T04:06:02.4756508Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqvb4wqg5/_remote_module_non_scriptable.py 2022-05-18T04:06:02.8543763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:02.8701019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:03.0660565Z skip: Need at least 4 CUDA devices (3.160s) 2022-05-18T04:06:03.0660857Z 2022-05-18T04:06:03.0661597Z ---------------------------------------------------------------------- 2022-05-18T04:06:03.0661937Z Ran 1 test in 3.161s 2022-05-18T04:06:03.0662102Z 2022-05-18T04:06:03.0662211Z OK (skipped=1) 2022-05-18T04:06:03.0662837Z 2022-05-18T04:06:03.0662969Z Generating XML reports... 2022-05-18T04:06:03.0705953Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040559.xml 2022-05-18T04:06:04.2242652Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj4kg_lw0 2022-05-18T04:06:04.2243828Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj4kg_lw0/_remote_module_non_scriptable.py 2022-05-18T04:06:04.6211550Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:04.6225461Z 2022-05-18T04:06:04.6225692Z Running tests... 2022-05-18T04:06:04.6226111Z ---------------------------------------------------------------------- 2022-05-18T04:06:06.1691471Z test_basic_gloo_ckpt_never (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:06.2067089Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3322 2022-05-18T04:06:06.2167516Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3323 2022-05-18T04:06:07.1472451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp49gxpgsz 2022-05-18T04:06:07.1473047Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmperifuuyq 2022-05-18T04:06:07.1473815Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp49gxpgsz/_remote_module_non_scriptable.py 2022-05-18T04:06:07.1474480Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmperifuuyq/_remote_module_non_scriptable.py 2022-05-18T04:06:07.5448248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:07.5479710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:07.7218231Z skip: Need at least 4 CUDA devices (3.099s) 2022-05-18T04:06:07.7218532Z 2022-05-18T04:06:07.7219167Z ---------------------------------------------------------------------- 2022-05-18T04:06:07.7219497Z Ran 1 test in 3.099s 2022-05-18T04:06:07.7219663Z 2022-05-18T04:06:07.7219773Z OK (skipped=1) 2022-05-18T04:06:07.7219928Z 2022-05-18T04:06:07.7220056Z Generating XML reports... 2022-05-18T04:06:07.7263028Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040604.xml 2022-05-18T04:06:08.8895495Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb1wjui1s 2022-05-18T04:06:08.8896695Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb1wjui1s/_remote_module_non_scriptable.py 2022-05-18T04:06:09.3015288Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:09.3031545Z 2022-05-18T04:06:09.3031799Z Running tests... 2022-05-18T04:06:09.3032233Z ---------------------------------------------------------------------- 2022-05-18T04:06:10.9065880Z test_basic_gloo_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:10.9440584Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3425 2022-05-18T04:06:10.9540487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3426 2022-05-18T04:06:11.8454036Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu4p5_lc8 2022-05-18T04:06:11.8455278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu4p5_lc8/_remote_module_non_scriptable.py 2022-05-18T04:06:11.8558611Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4wk_tgf0 2022-05-18T04:06:11.8561686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4wk_tgf0/_remote_module_non_scriptable.py 2022-05-18T04:06:12.2415501Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:12.2663329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:12.4590423Z skip: Need at least 4 CUDA devices (3.156s) 2022-05-18T04:06:12.4590852Z 2022-05-18T04:06:12.4591388Z ---------------------------------------------------------------------- 2022-05-18T04:06:12.4591714Z Ran 1 test in 3.156s 2022-05-18T04:06:12.4591880Z 2022-05-18T04:06:12.4591990Z OK (skipped=1) 2022-05-18T04:06:12.4592147Z 2022-05-18T04:06:12.4594340Z Generating XML reports... 2022-05-18T04:06:12.4635440Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040609.xml 2022-05-18T04:06:13.6270456Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpufwf7eri 2022-05-18T04:06:13.6271708Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpufwf7eri/_remote_module_non_scriptable.py 2022-05-18T04:06:14.0269943Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:14.0284475Z 2022-05-18T04:06:14.0284717Z Running tests... 2022-05-18T04:06:14.0285132Z ---------------------------------------------------------------------- 2022-05-18T04:06:15.5849808Z test_basic_nccl_ckpt_always (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:15.6222865Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3528 2022-05-18T04:06:15.6324479Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3529 2022-05-18T04:06:16.5224192Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0q0qrbwe 2022-05-18T04:06:16.5225636Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0q0qrbwe/_remote_module_non_scriptable.py 2022-05-18T04:06:16.5537908Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6gdndrd7 2022-05-18T04:06:16.5540696Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6gdndrd7/_remote_module_non_scriptable.py 2022-05-18T04:06:16.9320143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:16.9556600Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:17.1374053Z skip: Need at least 4 CUDA devices (3.109s) 2022-05-18T04:06:17.1374520Z 2022-05-18T04:06:17.1375695Z ---------------------------------------------------------------------- 2022-05-18T04:06:17.1376049Z Ran 1 test in 3.109s 2022-05-18T04:06:17.1376214Z 2022-05-18T04:06:17.1376325Z OK (skipped=1) 2022-05-18T04:06:17.1376482Z 2022-05-18T04:06:17.1376607Z Generating XML reports... 2022-05-18T04:06:17.1418874Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040614.xml 2022-05-18T04:06:18.2980965Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_vhwpbzv 2022-05-18T04:06:18.2981724Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_vhwpbzv/_remote_module_non_scriptable.py 2022-05-18T04:06:18.6976568Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:18.6990729Z 2022-05-18T04:06:18.6990979Z Running tests... 2022-05-18T04:06:18.6991413Z ---------------------------------------------------------------------- 2022-05-18T04:06:20.2362953Z test_basic_nccl_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:20.2740607Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3631 2022-05-18T04:06:20.2841734Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3632 2022-05-18T04:06:21.1676071Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphtnnkj3g 2022-05-18T04:06:21.1677597Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphtnnkj3g/_remote_module_non_scriptable.py 2022-05-18T04:06:21.2161791Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfnk94i2u 2022-05-18T04:06:21.2165072Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfnk94i2u/_remote_module_non_scriptable.py 2022-05-18T04:06:21.5740806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:21.6428747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:21.7889801Z skip: Need at least 4 CUDA devices (3.090s) 2022-05-18T04:06:21.7890011Z 2022-05-18T04:06:21.7890416Z ---------------------------------------------------------------------- 2022-05-18T04:06:21.7891067Z Ran 1 test in 3.090s 2022-05-18T04:06:21.7891419Z 2022-05-18T04:06:21.7891656Z OK (skipped=1) 2022-05-18T04:06:21.7891946Z 2022-05-18T04:06:21.7892058Z Generating XML reports... 2022-05-18T04:06:21.7934724Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040618.xml 2022-05-18T04:06:22.9625871Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbtehpvwj 2022-05-18T04:06:22.9627083Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbtehpvwj/_remote_module_non_scriptable.py 2022-05-18T04:06:23.3715877Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:23.3730466Z 2022-05-18T04:06:23.3731075Z Running tests... 2022-05-18T04:06:23.3731617Z ---------------------------------------------------------------------- 2022-05-18T04:06:24.9460970Z test_basic_nccl_ckpt_never (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:24.9843766Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3734 2022-05-18T04:06:24.9945500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3735 2022-05-18T04:06:25.8767077Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoiu7uxus 2022-05-18T04:06:25.8768217Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoiu7uxus/_remote_module_non_scriptable.py 2022-05-18T04:06:25.9240875Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi23ap5qf 2022-05-18T04:06:25.9243306Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi23ap5qf/_remote_module_non_scriptable.py 2022-05-18T04:06:26.2878904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:26.3280185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:26.4994783Z skip: Need at least 4 CUDA devices (3.126s) 2022-05-18T04:06:26.4995030Z 2022-05-18T04:06:26.4995417Z ---------------------------------------------------------------------- 2022-05-18T04:06:26.4995786Z Ran 1 test in 3.126s 2022-05-18T04:06:26.4995962Z 2022-05-18T04:06:26.4996052Z OK (skipped=1) 2022-05-18T04:06:26.4996211Z 2022-05-18T04:06:26.4996337Z Generating XML reports... 2022-05-18T04:06:26.5040149Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040623.xml 2022-05-18T04:06:27.6694658Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwn7fn71a 2022-05-18T04:06:27.6695838Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwn7fn71a/_remote_module_non_scriptable.py 2022-05-18T04:06:28.0798292Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:28.0814515Z 2022-05-18T04:06:28.0814895Z Running tests... 2022-05-18T04:06:28.0815780Z ---------------------------------------------------------------------- 2022-05-18T04:06:29.6624265Z test_basic_nccl_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:29.6999572Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3837 2022-05-18T04:06:29.7100842Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3838 2022-05-18T04:06:30.5861516Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy_kue97k 2022-05-18T04:06:30.5862691Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy_kue97k/_remote_module_non_scriptable.py 2022-05-18T04:06:30.5923316Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphnl6mh25 2022-05-18T04:06:30.5926361Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphnl6mh25/_remote_module_non_scriptable.py 2022-05-18T04:06:30.9823896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:31.0011368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:31.2150159Z skip: Need at least 4 CUDA devices (3.133s) 2022-05-18T04:06:31.2150588Z 2022-05-18T04:06:31.2151056Z ---------------------------------------------------------------------- 2022-05-18T04:06:31.2151400Z Ran 1 test in 3.134s 2022-05-18T04:06:31.2151565Z 2022-05-18T04:06:31.2151659Z OK (skipped=1) 2022-05-18T04:06:31.2151815Z 2022-05-18T04:06:31.2151942Z Generating XML reports... 2022-05-18T04:06:31.2194528Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040628.xml 2022-05-18T04:06:32.3809432Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph7kbxm0z 2022-05-18T04:06:32.3810273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph7kbxm0z/_remote_module_non_scriptable.py 2022-05-18T04:06:32.7772710Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:32.7786996Z 2022-05-18T04:06:32.7787420Z Running tests... 2022-05-18T04:06:32.7787922Z ---------------------------------------------------------------------- 2022-05-18T04:06:34.3311222Z test_async_execution_nested_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:34.3684054Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3940 2022-05-18T04:06:34.3785136Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3941 2022-05-18T04:06:34.3888658Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 3942 2022-05-18T04:06:34.3992342Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 3943 2022-05-18T04:06:35.2884906Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptlhj6v5h 2022-05-18T04:06:35.2886706Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptlhj6v5h/_remote_module_non_scriptable.py 2022-05-18T04:06:35.3011990Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxe6d0puk 2022-05-18T04:06:35.3014264Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxe6d0puk/_remote_module_non_scriptable.py 2022-05-18T04:06:35.3123399Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk9xy2j63 2022-05-18T04:06:35.3126277Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk9xy2j63/_remote_module_non_scriptable.py 2022-05-18T04:06:35.3627059Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_yampg3k 2022-05-18T04:06:35.3629580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_yampg3k/_remote_module_non_scriptable.py 2022-05-18T04:06:35.6963482Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:35.7048301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:06:35.7182873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:35.7722351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:06:40.4144823Z ok (7.635s) 2022-05-18T04:06:40.4145051Z 2022-05-18T04:06:40.4145453Z ---------------------------------------------------------------------- 2022-05-18T04:06:40.4145798Z Ran 1 test in 7.636s 2022-05-18T04:06:40.4145965Z 2022-05-18T04:06:40.4146045Z OK 2022-05-18T04:06:40.4146180Z 2022-05-18T04:06:40.4146319Z Generating XML reports... 2022-05-18T04:06:40.4189881Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040632.xml 2022-05-18T04:06:41.5885652Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4oajcv5r 2022-05-18T04:06:41.5886650Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4oajcv5r/_remote_module_non_scriptable.py 2022-05-18T04:06:41.9983157Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:41.9998187Z 2022-05-18T04:06:41.9998344Z Running tests... 2022-05-18T04:06:41.9998791Z ---------------------------------------------------------------------- 2022-05-18T04:06:43.5702434Z test_async_execution_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:43.6084131Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4283 2022-05-18T04:06:43.6185885Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4284 2022-05-18T04:06:43.6290501Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4285 2022-05-18T04:06:43.6393923Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4286 2022-05-18T04:06:44.5168337Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbfiasjgc 2022-05-18T04:06:44.5169502Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbfiasjgc/_remote_module_non_scriptable.py 2022-05-18T04:06:44.5355703Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbnynrpvg 2022-05-18T04:06:44.5358523Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbnynrpvg/_remote_module_non_scriptable.py 2022-05-18T04:06:44.5599176Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm92n7j66 2022-05-18T04:06:44.5602063Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm92n7j66/_remote_module_non_scriptable.py 2022-05-18T04:06:44.5914510Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx8rvdntg 2022-05-18T04:06:44.5917009Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx8rvdntg/_remote_module_non_scriptable.py 2022-05-18T04:06:44.9204871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:44.9346117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:06:44.9744123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:44.9981532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:06:52.2635258Z ok (10.263s) 2022-05-18T04:06:52.2635482Z 2022-05-18T04:06:52.2635888Z ---------------------------------------------------------------------- 2022-05-18T04:06:52.2636213Z Ran 1 test in 10.264s 2022-05-18T04:06:52.2636381Z 2022-05-18T04:06:52.2638855Z OK 2022-05-18T04:06:52.2639023Z 2022-05-18T04:06:52.2639189Z Generating XML reports... 2022-05-18T04:06:52.2679466Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040641.xml 2022-05-18T04:06:53.4373146Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa8hw2vet 2022-05-18T04:06:53.4374552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa8hw2vet/_remote_module_non_scriptable.py 2022-05-18T04:06:53.8480541Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:06:53.8495031Z 2022-05-18T04:06:53.8495158Z Running tests... 2022-05-18T04:06:53.8495606Z ---------------------------------------------------------------------- 2022-05-18T04:06:55.4336950Z test_cuda_future_callback_changes_devices (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:06:55.4720161Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4626 2022-05-18T04:06:55.4821224Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4627 2022-05-18T04:06:55.4926051Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4628 2022-05-18T04:06:55.5031905Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4629 2022-05-18T04:06:56.4252018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqlqr1cl3 2022-05-18T04:06:56.4253874Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqlqr1cl3/_remote_module_non_scriptable.py 2022-05-18T04:06:56.4609825Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyo6m4197 2022-05-18T04:06:56.4612143Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyo6m4197/_remote_module_non_scriptable.py 2022-05-18T04:06:56.4778500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgyqhd3dd 2022-05-18T04:06:56.4780975Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgyqhd3dd/_remote_module_non_scriptable.py 2022-05-18T04:06:56.5044865Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwf_qvnoi 2022-05-18T04:06:56.5047716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwf_qvnoi/_remote_module_non_scriptable.py 2022-05-18T04:06:56.8370071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:06:56.8629103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:06:56.8911584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:06:56.9180476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:07:03.7231923Z ok (9.873s) 2022-05-18T04:07:03.7232131Z 2022-05-18T04:07:03.7232744Z ---------------------------------------------------------------------- 2022-05-18T04:07:03.7233894Z Ran 1 test in 9.874s 2022-05-18T04:07:03.7234105Z 2022-05-18T04:07:03.7234205Z OK 2022-05-18T04:07:03.7234349Z 2022-05-18T04:07:03.7234485Z Generating XML reports... 2022-05-18T04:07:03.7275987Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040653.xml 2022-05-18T04:07:04.9010676Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk_rjuavw 2022-05-18T04:07:04.9012351Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk_rjuavw/_remote_module_non_scriptable.py 2022-05-18T04:07:05.3109284Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:07:05.3123275Z 2022-05-18T04:07:05.3124111Z Running tests... 2022-05-18T04:07:05.3125089Z ---------------------------------------------------------------------- 2022-05-18T04:07:06.8957283Z test_cuda_future_can_extract_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:07:06.9337991Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4805 2022-05-18T04:07:06.9439691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4806 2022-05-18T04:07:06.9544005Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4807 2022-05-18T04:07:06.9649062Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4808 2022-05-18T04:07:07.9363540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpck6m47so 2022-05-18T04:07:07.9364403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpck6m47so/_remote_module_non_scriptable.py 2022-05-18T04:07:07.9401419Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwi8lgrea 2022-05-18T04:07:07.9404337Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwi8lgrea/_remote_module_non_scriptable.py 2022-05-18T04:07:07.9460343Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiylggw3b 2022-05-18T04:07:07.9463093Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiylggw3b/_remote_module_non_scriptable.py 2022-05-18T04:07:07.9757436Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp60ggh2pl 2022-05-18T04:07:07.9760707Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp60ggh2pl/_remote_module_non_scriptable.py 2022-05-18T04:07:08.3437464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:07:08.3506854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:07:08.3542460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:07:08.3835179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:07:13.9817652Z ok (8.669s) 2022-05-18T04:07:13.9817870Z 2022-05-18T04:07:13.9818306Z ---------------------------------------------------------------------- 2022-05-18T04:07:13.9818655Z Ran 1 test in 8.669s 2022-05-18T04:07:13.9818804Z 2022-05-18T04:07:13.9818905Z OK 2022-05-18T04:07:13.9819047Z 2022-05-18T04:07:13.9819186Z Generating XML reports... 2022-05-18T04:07:13.9861735Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040705.xml 2022-05-18T04:07:15.1541146Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl7b_qn4i 2022-05-18T04:07:15.1542238Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl7b_qn4i/_remote_module_non_scriptable.py 2022-05-18T04:07:15.5681306Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:07:15.5696200Z 2022-05-18T04:07:15.5696589Z Running tests... 2022-05-18T04:07:15.5697290Z ---------------------------------------------------------------------- 2022-05-18T04:07:17.1578068Z test_cuda_future_can_extract_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:07:17.1962059Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5040 2022-05-18T04:07:17.2062551Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5041 2022-05-18T04:07:17.2166847Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5042 2022-05-18T04:07:17.2271694Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5043 2022-05-18T04:07:18.1504538Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp74a0ah0e 2022-05-18T04:07:18.1505619Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp74a0ah0e/_remote_module_non_scriptable.py 2022-05-18T04:07:18.2038295Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphl6k5hgo 2022-05-18T04:07:18.2040414Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphl6k5hgo/_remote_module_non_scriptable.py 2022-05-18T04:07:18.2157209Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdvs54dlv 2022-05-18T04:07:18.2159966Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdvs54dlv/_remote_module_non_scriptable.py 2022-05-18T04:07:18.2549434Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo5jnirf9 2022-05-18T04:07:18.2552222Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo5jnirf9/_remote_module_non_scriptable.py 2022-05-18T04:07:18.5484693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:07:18.6158791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:07:18.6193023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:07:18.6688149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:07:24.2440092Z ok (8.674s) 2022-05-18T04:07:24.2440758Z 2022-05-18T04:07:24.2441267Z ---------------------------------------------------------------------- 2022-05-18T04:07:24.2441649Z Ran 1 test in 8.674s 2022-05-18T04:07:24.2441816Z 2022-05-18T04:07:24.2441911Z OK 2022-05-18T04:07:24.2442029Z 2022-05-18T04:07:24.2442189Z Generating XML reports... 2022-05-18T04:07:24.2486039Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040715.xml 2022-05-18T04:07:25.4280325Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjtz6sgyu 2022-05-18T04:07:25.4281808Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjtz6sgyu/_remote_module_non_scriptable.py 2022-05-18T04:07:25.8399608Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:07:25.8414493Z 2022-05-18T04:07:25.8414944Z Running tests... 2022-05-18T04:07:25.8415570Z ---------------------------------------------------------------------- 2022-05-18T04:07:27.4263175Z test_cuda_future_can_extract_custom_class_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:07:27.4648753Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5215 2022-05-18T04:07:27.4751492Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5216 2022-05-18T04:07:27.4856485Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5217 2022-05-18T04:07:27.4961998Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5218 2022-05-18T04:07:28.4143398Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyn_grnaa 2022-05-18T04:07:28.4144716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyn_grnaa/_remote_module_non_scriptable.py 2022-05-18T04:07:28.4432942Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt3mcaweb 2022-05-18T04:07:28.4435384Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt3mcaweb/_remote_module_non_scriptable.py 2022-05-18T04:07:28.4453423Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbx_6cs7w 2022-05-18T04:07:28.4456251Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbx_6cs7w/_remote_module_non_scriptable.py 2022-05-18T04:07:28.4516996Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps7ifz44j 2022-05-18T04:07:28.4519948Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps7ifz44j/_remote_module_non_scriptable.py 2022-05-18T04:07:28.8162758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:07:28.8495441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:07:28.8532719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:07:28.8548285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:07:34.0120937Z ok (8.170s) 2022-05-18T04:07:34.0121169Z 2022-05-18T04:07:34.0121587Z ---------------------------------------------------------------------- 2022-05-18T04:07:34.0122251Z Ran 1 test in 8.171s 2022-05-18T04:07:34.0122421Z 2022-05-18T04:07:34.0122516Z OK 2022-05-18T04:07:34.0122653Z 2022-05-18T04:07:34.0122790Z Generating XML reports... 2022-05-18T04:07:34.0166426Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040725.xml 2022-05-18T04:07:35.2077657Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7j3s1uvh 2022-05-18T04:07:35.2078898Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7j3s1uvh/_remote_module_non_scriptable.py 2022-05-18T04:07:35.6185414Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:07:35.6200533Z 2022-05-18T04:07:35.6200762Z Running tests... 2022-05-18T04:07:35.6201199Z ---------------------------------------------------------------------- 2022-05-18T04:07:37.2149589Z test_cuda_future_can_extract_custom_class_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:07:37.2536133Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5454 2022-05-18T04:07:37.2638770Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5455 2022-05-18T04:07:37.2745860Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5456 2022-05-18T04:07:37.2853775Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5457 2022-05-18T04:07:38.2298747Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmwo1aby_ 2022-05-18T04:07:38.2299590Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmwo1aby_/_remote_module_non_scriptable.py 2022-05-18T04:07:38.2554185Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7c4m9xg5 2022-05-18T04:07:38.2556978Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7c4m9xg5/_remote_module_non_scriptable.py 2022-05-18T04:07:38.2765961Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbkbsxj9h 2022-05-18T04:07:38.2768964Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbkbsxj9h/_remote_module_non_scriptable.py 2022-05-18T04:07:38.3109127Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphqsye_69 2022-05-18T04:07:38.3111862Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphqsye_69/_remote_module_non_scriptable.py 2022-05-18T04:07:38.6296468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:07:38.6585799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:07:38.6829577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:07:38.7310259Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:07:44.3023339Z ok (8.682s) 2022-05-18T04:07:44.3023583Z 2022-05-18T04:07:44.3024207Z ---------------------------------------------------------------------- 2022-05-18T04:07:44.3024559Z Ran 1 test in 8.682s 2022-05-18T04:07:44.3024706Z 2022-05-18T04:07:44.3024808Z OK 2022-05-18T04:07:44.3024945Z 2022-05-18T04:07:44.3025079Z Generating XML reports... 2022-05-18T04:07:44.3069164Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040735.xml 2022-05-18T04:07:45.4872570Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfwdb2fci 2022-05-18T04:07:45.4873580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfwdb2fci/_remote_module_non_scriptable.py 2022-05-18T04:07:45.8932342Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:07:45.8947346Z 2022-05-18T04:07:45.8947590Z Running tests... 2022-05-18T04:07:45.8948015Z ---------------------------------------------------------------------- 2022-05-18T04:07:47.4701806Z test_cuda_future_can_extract_list_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:07:47.5089118Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5633 2022-05-18T04:07:47.5192968Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5634 2022-05-18T04:07:47.5299418Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5635 2022-05-18T04:07:47.5405246Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5636 2022-05-18T04:07:48.4138246Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpacfwl5v7 2022-05-18T04:07:48.4139439Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpacfwl5v7/_remote_module_non_scriptable.py 2022-05-18T04:07:48.4379792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoupdsgav 2022-05-18T04:07:48.4382576Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoupdsgav/_remote_module_non_scriptable.py 2022-05-18T04:07:48.5041690Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpft6vzj_1 2022-05-18T04:07:48.5043182Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpft6vzj_1/_remote_module_non_scriptable.py 2022-05-18T04:07:48.5339338Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbcby36sx 2022-05-18T04:07:48.5342112Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbcby36sx/_remote_module_non_scriptable.py 2022-05-18T04:07:48.8126950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:07:48.8384931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:07:48.9123564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:07:48.9481536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:07:54.4571187Z ok (8.562s) 2022-05-18T04:07:54.4571425Z 2022-05-18T04:07:54.4571840Z ---------------------------------------------------------------------- 2022-05-18T04:07:54.4572184Z Ran 1 test in 8.562s 2022-05-18T04:07:54.4572334Z 2022-05-18T04:07:54.4572431Z OK 2022-05-18T04:07:54.4574651Z 2022-05-18T04:07:54.4574910Z Generating XML reports... 2022-05-18T04:07:54.4615830Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040745.xml 2022-05-18T04:07:55.6256596Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp91ypqhtv 2022-05-18T04:07:55.6257494Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp91ypqhtv/_remote_module_non_scriptable.py 2022-05-18T04:07:56.0374039Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:07:56.0389097Z 2022-05-18T04:07:56.0389389Z Running tests... 2022-05-18T04:07:56.0389833Z ---------------------------------------------------------------------- 2022-05-18T04:07:57.6248120Z test_cuda_future_can_extract_list_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:07:57.6637564Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5868 2022-05-18T04:07:57.6741545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5869 2022-05-18T04:07:57.6848272Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5870 2022-05-18T04:07:57.6954383Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5871 2022-05-18T04:07:58.6419151Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbuewx9nu 2022-05-18T04:07:58.6420635Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbuewx9nu/_remote_module_non_scriptable.py 2022-05-18T04:07:58.6422632Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt22c7cpm 2022-05-18T04:07:58.6426284Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt22c7cpm/_remote_module_non_scriptable.py 2022-05-18T04:07:58.6455277Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_g5tsijp 2022-05-18T04:07:58.6458156Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_g5tsijp/_remote_module_non_scriptable.py 2022-05-18T04:07:58.6552172Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptmt63his 2022-05-18T04:07:58.6555211Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptmt63his/_remote_module_non_scriptable.py 2022-05-18T04:07:59.0466359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:07:59.0510255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:07:59.0542237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:07:59.0571914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:04.7122739Z ok (8.673s) 2022-05-18T04:08:04.7122983Z 2022-05-18T04:08:04.7123381Z ---------------------------------------------------------------------- 2022-05-18T04:08:04.7123723Z Ran 1 test in 8.673s 2022-05-18T04:08:04.7123895Z 2022-05-18T04:08:04.7123986Z OK 2022-05-18T04:08:04.7124104Z 2022-05-18T04:08:04.7124234Z Generating XML reports... 2022-05-18T04:08:04.7167797Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040756.xml 2022-05-18T04:08:05.8762006Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6jcyf_hg 2022-05-18T04:08:05.8763370Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6jcyf_hg/_remote_module_non_scriptable.py 2022-05-18T04:08:06.2780127Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:06.2795394Z 2022-05-18T04:08:06.2795535Z Running tests... 2022-05-18T04:08:06.2796344Z ---------------------------------------------------------------------- 2022-05-18T04:08:07.8331129Z test_cuda_future_device_as_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:07.8710481Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6043 2022-05-18T04:08:07.8813537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6044 2022-05-18T04:08:07.8918165Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6045 2022-05-18T04:08:07.9024474Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6046 2022-05-18T04:08:08.7969651Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpair5_77p 2022-05-18T04:08:08.7970638Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpair5_77p/_remote_module_non_scriptable.py 2022-05-18T04:08:08.7990091Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8q3s7ri6 2022-05-18T04:08:08.7992811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8q3s7ri6/_remote_module_non_scriptable.py 2022-05-18T04:08:08.8220503Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3mvenjbm 2022-05-18T04:08:08.8223121Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3mvenjbm/_remote_module_non_scriptable.py 2022-05-18T04:08:08.8473776Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxlm13hah 2022-05-18T04:08:08.8476632Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxlm13hah/_remote_module_non_scriptable.py 2022-05-18T04:08:09.1980652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:09.2010595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:09.2256213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:09.2679578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:09.5080930Z ok (3.228s) 2022-05-18T04:08:09.5081152Z 2022-05-18T04:08:09.5081560Z ---------------------------------------------------------------------- 2022-05-18T04:08:09.5081885Z Ran 1 test in 3.229s 2022-05-18T04:08:09.5082175Z 2022-05-18T04:08:09.5082316Z OK 2022-05-18T04:08:09.5082490Z 2022-05-18T04:08:09.5082676Z Generating XML reports... 2022-05-18T04:08:09.5125051Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040806.xml 2022-05-18T04:08:10.6780463Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpml1vnfqu 2022-05-18T04:08:10.6796397Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpml1vnfqu/_remote_module_non_scriptable.py 2022-05-18T04:08:11.0897655Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:11.0912475Z 2022-05-18T04:08:11.0912626Z Running tests... 2022-05-18T04:08:11.0913072Z ---------------------------------------------------------------------- 2022-05-18T04:08:12.6646654Z test_cuda_future_device_as_int (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:12.7026909Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6214 2022-05-18T04:08:12.7128735Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6215 2022-05-18T04:08:12.7232352Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6216 2022-05-18T04:08:12.7336630Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6217 2022-05-18T04:08:13.6986949Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgazn1kyy 2022-05-18T04:08:13.6988087Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgazn1kyy/_remote_module_non_scriptable.py 2022-05-18T04:08:13.7089438Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl_qnn2_6 2022-05-18T04:08:13.7092523Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl_qnn2_6/_remote_module_non_scriptable.py 2022-05-18T04:08:13.7101173Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg_ffec6g 2022-05-18T04:08:13.7104336Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg_ffec6g/_remote_module_non_scriptable.py 2022-05-18T04:08:13.7433791Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgae1i3a5 2022-05-18T04:08:13.7436313Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgae1i3a5/_remote_module_non_scriptable.py 2022-05-18T04:08:14.1024022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:14.1093449Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:14.1203942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:14.1478367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:14.3394184Z ok (3.248s) 2022-05-18T04:08:14.3394589Z 2022-05-18T04:08:14.3395094Z ---------------------------------------------------------------------- 2022-05-18T04:08:14.3395481Z Ran 1 test in 3.248s 2022-05-18T04:08:14.3395651Z 2022-05-18T04:08:14.3395757Z OK 2022-05-18T04:08:14.3395875Z 2022-05-18T04:08:14.3396005Z Generating XML reports... 2022-05-18T04:08:14.3438719Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040811.xml 2022-05-18T04:08:15.4948638Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpckjxeu5i 2022-05-18T04:08:15.4949816Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpckjxeu5i/_remote_module_non_scriptable.py 2022-05-18T04:08:15.8924747Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:15.8938783Z 2022-05-18T04:08:15.8939077Z Running tests... 2022-05-18T04:08:15.8939659Z ---------------------------------------------------------------------- 2022-05-18T04:08:17.4479574Z test_cuda_future_device_as_str (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:17.4861256Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6385 2022-05-18T04:08:17.4964067Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6386 2022-05-18T04:08:17.5069720Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6387 2022-05-18T04:08:17.5175538Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6388 2022-05-18T04:08:18.4288796Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0lz80xml 2022-05-18T04:08:18.4290471Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0lz80xml/_remote_module_non_scriptable.py 2022-05-18T04:08:18.4313921Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9fvp552r 2022-05-18T04:08:18.4316841Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9fvp552r/_remote_module_non_scriptable.py 2022-05-18T04:08:18.4418038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptyp7_wpd 2022-05-18T04:08:18.4421178Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptyp7_wpd/_remote_module_non_scriptable.py 2022-05-18T04:08:18.4560235Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1guus4_u 2022-05-18T04:08:18.4563011Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1guus4_u/_remote_module_non_scriptable.py 2022-05-18T04:08:18.8351668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:18.8442658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:18.8471059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:18.8600134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:19.0229050Z ok (3.129s) 2022-05-18T04:08:19.0229282Z 2022-05-18T04:08:19.0230000Z ---------------------------------------------------------------------- 2022-05-18T04:08:19.0230355Z Ran 1 test in 3.129s 2022-05-18T04:08:19.0230521Z 2022-05-18T04:08:19.0230623Z OK 2022-05-18T04:08:19.0230760Z 2022-05-18T04:08:19.0230894Z Generating XML reports... 2022-05-18T04:08:19.0274545Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040815.xml 2022-05-18T04:08:20.1846019Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1bv0vf0c 2022-05-18T04:08:20.1851507Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1bv0vf0c/_remote_module_non_scriptable.py 2022-05-18T04:08:20.5950632Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:20.5968169Z 2022-05-18T04:08:20.5968602Z Running tests... 2022-05-18T04:08:20.5969038Z ---------------------------------------------------------------------- 2022-05-18T04:08:22.2019214Z test_cuda_future_device_not_cuda (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:22.2408243Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6556 2022-05-18T04:08:22.2513815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6557 2022-05-18T04:08:22.2620011Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6558 2022-05-18T04:08:22.2728993Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6559 2022-05-18T04:08:23.1654265Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbps1skes 2022-05-18T04:08:23.1655505Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbps1skes/_remote_module_non_scriptable.py 2022-05-18T04:08:23.1703993Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc3pw_4xw 2022-05-18T04:08:23.1706886Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc3pw_4xw/_remote_module_non_scriptable.py 2022-05-18T04:08:23.1920049Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4h650lzn 2022-05-18T04:08:23.1922675Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4h650lzn/_remote_module_non_scriptable.py 2022-05-18T04:08:23.1978072Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2lefli_0 2022-05-18T04:08:23.1980775Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2lefli_0/_remote_module_non_scriptable.py 2022-05-18T04:08:23.5653088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:23.5737894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:23.5965415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:23.5976806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:23.7782851Z ok (3.181s) 2022-05-18T04:08:23.7783091Z 2022-05-18T04:08:23.7783495Z ---------------------------------------------------------------------- 2022-05-18T04:08:23.7784121Z Ran 1 test in 3.181s 2022-05-18T04:08:23.7784292Z 2022-05-18T04:08:23.7784389Z OK 2022-05-18T04:08:23.7784522Z 2022-05-18T04:08:23.7784660Z Generating XML reports... 2022-05-18T04:08:23.7828844Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040820.xml 2022-05-18T04:08:24.9444877Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppsfvda71 2022-05-18T04:08:24.9446113Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppsfvda71/_remote_module_non_scriptable.py 2022-05-18T04:08:25.3432497Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:25.3446505Z 2022-05-18T04:08:25.3446945Z Running tests... 2022-05-18T04:08:25.3447675Z ---------------------------------------------------------------------- 2022-05-18T04:08:26.8942901Z test_cuda_future_modify_tensor_inplace (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:26.9321647Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6727 2022-05-18T04:08:26.9422854Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6728 2022-05-18T04:08:26.9528507Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6729 2022-05-18T04:08:26.9633694Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6730 2022-05-18T04:08:27.8499351Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo74ugx9u 2022-05-18T04:08:27.8500196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo74ugx9u/_remote_module_non_scriptable.py 2022-05-18T04:08:27.8697162Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcnm98g5p 2022-05-18T04:08:27.8699654Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcnm98g5p/_remote_module_non_scriptable.py 2022-05-18T04:08:27.8957941Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz6buoefa 2022-05-18T04:08:27.8960686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz6buoefa/_remote_module_non_scriptable.py 2022-05-18T04:08:27.8984572Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmporsc9w2o 2022-05-18T04:08:27.8987837Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmporsc9w2o/_remote_module_non_scriptable.py 2022-05-18T04:08:28.2489804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:28.2746131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:28.2997294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:28.3025626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:29.9723913Z ok (4.627s) 2022-05-18T04:08:29.9724136Z 2022-05-18T04:08:29.9724536Z ---------------------------------------------------------------------- 2022-05-18T04:08:29.9724910Z Ran 1 test in 4.628s 2022-05-18T04:08:29.9725109Z 2022-05-18T04:08:29.9725185Z OK 2022-05-18T04:08:29.9725329Z 2022-05-18T04:08:29.9725462Z Generating XML reports... 2022-05-18T04:08:29.9768677Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040825.xml 2022-05-18T04:08:31.1593960Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoo24m5ak 2022-05-18T04:08:31.1595075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoo24m5ak/_remote_module_non_scriptable.py 2022-05-18T04:08:31.5818314Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:31.5834191Z 2022-05-18T04:08:31.5834609Z Running tests... 2022-05-18T04:08:31.5835044Z ---------------------------------------------------------------------- 2022-05-18T04:08:33.1672655Z test_cuda_future_replace_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:33.2060361Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6902 2022-05-18T04:08:33.2164211Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6903 2022-05-18T04:08:33.2270866Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6904 2022-05-18T04:08:33.2378494Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6905 2022-05-18T04:08:34.1539233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy_f4py37 2022-05-18T04:08:34.1540593Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy_f4py37/_remote_module_non_scriptable.py 2022-05-18T04:08:34.1729433Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppc9fo542 2022-05-18T04:08:34.1731623Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppc9fo542/_remote_module_non_scriptable.py 2022-05-18T04:08:34.1804606Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkyrpe1k5 2022-05-18T04:08:34.1807394Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkyrpe1k5/_remote_module_non_scriptable.py 2022-05-18T04:08:34.2127784Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl_d0zdjo 2022-05-18T04:08:34.2130465Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl_d0zdjo/_remote_module_non_scriptable.py 2022-05-18T04:08:34.5513646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:34.5802962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:34.5832832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:34.6294545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:36.3470120Z ok (4.763s) 2022-05-18T04:08:36.3470421Z 2022-05-18T04:08:36.3471022Z ---------------------------------------------------------------------- 2022-05-18T04:08:36.3471664Z Ran 1 test in 4.764s 2022-05-18T04:08:36.3471829Z 2022-05-18T04:08:36.3471923Z OK 2022-05-18T04:08:36.3472040Z 2022-05-18T04:08:36.3472175Z Generating XML reports... 2022-05-18T04:08:36.3515386Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040831.xml 2022-05-18T04:08:37.5228683Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuxavyzby 2022-05-18T04:08:37.5229780Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuxavyzby/_remote_module_non_scriptable.py 2022-05-18T04:08:37.9351587Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:37.9366699Z 2022-05-18T04:08:37.9367188Z Running tests... 2022-05-18T04:08:37.9367707Z ---------------------------------------------------------------------- 2022-05-18T04:08:39.5203805Z test_cuda_future_value_on_bad_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:39.5591199Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7077 2022-05-18T04:08:39.5693549Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7078 2022-05-18T04:08:39.5799467Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7079 2022-05-18T04:08:39.5906774Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7080 2022-05-18T04:08:40.4620189Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpidugnsrx 2022-05-18T04:08:40.4621122Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpidugnsrx/_remote_module_non_scriptable.py 2022-05-18T04:08:40.5172612Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj74kyg0l 2022-05-18T04:08:40.5174821Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj74kyg0l/_remote_module_non_scriptable.py 2022-05-18T04:08:40.5295145Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8p1x1d1q 2022-05-18T04:08:40.5297953Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8p1x1d1q/_remote_module_non_scriptable.py 2022-05-18T04:08:40.5309580Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8cbbspfq 2022-05-18T04:08:40.5312433Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8cbbspfq/_remote_module_non_scriptable.py 2022-05-18T04:08:40.8599690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:40.9360095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:40.9377126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:40.9380488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:47.7101633Z ok (9.773s) 2022-05-18T04:08:47.7101991Z 2022-05-18T04:08:47.7102772Z ---------------------------------------------------------------------- 2022-05-18T04:08:47.7103372Z Ran 1 test in 9.773s 2022-05-18T04:08:47.7103547Z 2022-05-18T04:08:47.7103818Z OK 2022-05-18T04:08:47.7103958Z 2022-05-18T04:08:47.7104093Z Generating XML reports... 2022-05-18T04:08:47.7149075Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040837.xml 2022-05-18T04:08:48.8897514Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4oqtkl5i 2022-05-18T04:08:48.8898593Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4oqtkl5i/_remote_module_non_scriptable.py 2022-05-18T04:08:49.3014942Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:08:49.3030579Z 2022-05-18T04:08:49.3030813Z Running tests... 2022-05-18T04:08:49.3031258Z ---------------------------------------------------------------------- 2022-05-18T04:08:50.8760466Z test_custom_stream (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:08:50.9146844Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7256 2022-05-18T04:08:50.9252486Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7257 2022-05-18T04:08:50.9356467Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7258 2022-05-18T04:08:50.9462268Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7259 2022-05-18T04:08:51.8814401Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpprkklbh2 2022-05-18T04:08:51.8815148Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpprkklbh2/_remote_module_non_scriptable.py 2022-05-18T04:08:51.8959981Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpug_6jy56 2022-05-18T04:08:51.8962698Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpug_6jy56/_remote_module_non_scriptable.py 2022-05-18T04:08:51.9281490Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptahq44ga 2022-05-18T04:08:51.9284248Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptahq44ga/_remote_module_non_scriptable.py 2022-05-18T04:08:51.9386508Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc8w32ktk 2022-05-18T04:08:51.9389491Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc8w32ktk/_remote_module_non_scriptable.py 2022-05-18T04:08:52.2787331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:08:52.2956271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:08:52.3390815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:08:52.3411694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:08:59.9684059Z ok (10.665s) 2022-05-18T04:08:59.9684373Z 2022-05-18T04:08:59.9684952Z ---------------------------------------------------------------------- 2022-05-18T04:08:59.9685284Z Ran 1 test in 10.665s 2022-05-18T04:08:59.9685454Z 2022-05-18T04:08:59.9685550Z OK 2022-05-18T04:08:59.9685684Z 2022-05-18T04:08:59.9685818Z Generating XML reports... 2022-05-18T04:08:59.9728813Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040849.xml 2022-05-18T04:09:01.1413009Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb6ocz50f 2022-05-18T04:09:01.1414158Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb6ocz50f/_remote_module_non_scriptable.py 2022-05-18T04:09:01.5519217Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:09:01.5533891Z 2022-05-18T04:09:01.5534034Z Running tests... 2022-05-18T04:09:01.5534823Z ---------------------------------------------------------------------- 2022-05-18T04:09:03.1446193Z test_custom_stream_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:09:03.1834060Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7611 2022-05-18T04:09:03.1935982Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7612 2022-05-18T04:09:03.2042952Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7613 2022-05-18T04:09:03.2149888Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7614 2022-05-18T04:09:04.1728563Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsp00pt7d 2022-05-18T04:09:04.1729713Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsp00pt7d/_remote_module_non_scriptable.py 2022-05-18T04:09:04.1747500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc1oirtpi 2022-05-18T04:09:04.1750404Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc1oirtpi/_remote_module_non_scriptable.py 2022-05-18T04:09:04.2087584Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvx06632j 2022-05-18T04:09:04.2090067Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvx06632j/_remote_module_non_scriptable.py 2022-05-18T04:09:04.2624609Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuohnqeuq 2022-05-18T04:09:04.2627611Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuohnqeuq/_remote_module_non_scriptable.py 2022-05-18T04:09:04.5731244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:09:04.5808255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:09:04.6128509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:09:04.6729997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:09:18.8506763Z ok (17.297s) 2022-05-18T04:09:18.8507202Z 2022-05-18T04:09:18.8507600Z ---------------------------------------------------------------------- 2022-05-18T04:09:18.8507968Z Ran 1 test in 17.297s 2022-05-18T04:09:18.8508136Z 2022-05-18T04:09:18.8508235Z OK 2022-05-18T04:09:18.8508374Z 2022-05-18T04:09:18.8508510Z Generating XML reports... 2022-05-18T04:09:18.8552807Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040901.xml 2022-05-18T04:09:20.0252980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu3j48628 2022-05-18T04:09:20.0254208Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu3j48628/_remote_module_non_scriptable.py 2022-05-18T04:09:20.4354931Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:09:20.4370412Z 2022-05-18T04:09:20.4370935Z Running tests... 2022-05-18T04:09:20.4371421Z ---------------------------------------------------------------------- 2022-05-18T04:09:22.0125974Z test_custom_stream_nested (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:09:22.0505626Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7966 2022-05-18T04:09:22.0607769Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7967 2022-05-18T04:09:22.0714269Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7968 2022-05-18T04:09:22.0818882Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7969 2022-05-18T04:09:22.9830740Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdlh2vtmb 2022-05-18T04:09:22.9831664Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdlh2vtmb/_remote_module_non_scriptable.py 2022-05-18T04:09:22.9947839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeieenbnv 2022-05-18T04:09:22.9950777Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeieenbnv/_remote_module_non_scriptable.py 2022-05-18T04:09:23.0001100Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6faeg61m 2022-05-18T04:09:23.0003822Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6faeg61m/_remote_module_non_scriptable.py 2022-05-18T04:09:23.0267633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpviqh0jdv 2022-05-18T04:09:23.0270272Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpviqh0jdv/_remote_module_non_scriptable.py 2022-05-18T04:09:23.3796470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:09:23.4009977Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:09:23.4066356Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:09:23.4256106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:09:32.1057062Z ok (11.668s) 2022-05-18T04:09:32.1057272Z 2022-05-18T04:09:32.1057681Z ---------------------------------------------------------------------- 2022-05-18T04:09:32.1058037Z Ran 1 test in 11.669s 2022-05-18T04:09:32.1058208Z 2022-05-18T04:09:32.1058307Z OK 2022-05-18T04:09:32.1058443Z 2022-05-18T04:09:32.1058580Z Generating XML reports... 2022-05-18T04:09:32.1101576Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040920.xml 2022-05-18T04:09:33.2743073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6luwj50k 2022-05-18T04:09:33.2744296Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6luwj50k/_remote_module_non_scriptable.py 2022-05-18T04:09:33.6881231Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:09:33.6896221Z 2022-05-18T04:09:33.6896628Z Running tests... 2022-05-18T04:09:33.6897125Z ---------------------------------------------------------------------- 2022-05-18T04:09:35.2774092Z test_custom_stream_nested_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:09:35.3164615Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8321 2022-05-18T04:09:35.3268983Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8322 2022-05-18T04:09:35.3375699Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 8323 2022-05-18T04:09:35.3482553Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 8324 2022-05-18T04:09:36.2107452Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc_7msyxb 2022-05-18T04:09:36.2108520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc_7msyxb/_remote_module_non_scriptable.py 2022-05-18T04:09:36.2735455Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1bvuxfjh 2022-05-18T04:09:36.2737482Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1bvuxfjh/_remote_module_non_scriptable.py 2022-05-18T04:09:36.2963880Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc_vfktkf 2022-05-18T04:09:36.2966818Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc_vfktkf/_remote_module_non_scriptable.py 2022-05-18T04:09:36.3033602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprdyi5ilo 2022-05-18T04:09:36.3036850Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprdyi5ilo/_remote_module_non_scriptable.py 2022-05-18T04:09:36.6070529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:09:36.6850374Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:09:36.6992620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:09:36.7136570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:09:43.4686317Z ok (9.779s) 2022-05-18T04:09:43.4686541Z 2022-05-18T04:09:43.4686954Z ---------------------------------------------------------------------- 2022-05-18T04:09:43.4687310Z Ran 1 test in 9.779s 2022-05-18T04:09:43.4687476Z 2022-05-18T04:09:43.4687574Z OK 2022-05-18T04:09:43.4687709Z 2022-05-18T04:09:43.4687844Z Generating XML reports... 2022-05-18T04:09:43.4731606Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040933.xml 2022-05-18T04:09:44.6330773Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqpetmh42 2022-05-18T04:09:44.6332196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqpetmh42/_remote_module_non_scriptable.py 2022-05-18T04:09:45.0403932Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:09:45.0418109Z 2022-05-18T04:09:45.0418252Z Running tests... 2022-05-18T04:09:45.0418932Z ---------------------------------------------------------------------- 2022-05-18T04:09:46.6031164Z test_device_map_cpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:09:46.6409732Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8671 2022-05-18T04:09:46.6513250Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8672 2022-05-18T04:09:46.6618717Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 8673 2022-05-18T04:09:46.6724752Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 8674 2022-05-18T04:09:47.6086199Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp02mxae0l 2022-05-18T04:09:47.6087060Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp02mxae0l/_remote_module_non_scriptable.py 2022-05-18T04:09:47.6302187Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxm211fu4 2022-05-18T04:09:47.6305009Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxm211fu4/_remote_module_non_scriptable.py 2022-05-18T04:09:47.6615038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmmoyilpi 2022-05-18T04:09:47.6617916Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmmoyilpi/_remote_module_non_scriptable.py 2022-05-18T04:09:47.6788283Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7f65zsm6 2022-05-18T04:09:47.6790809Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7f65zsm6/_remote_module_non_scriptable.py 2022-05-18T04:09:48.0122777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:09:48.0390072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:09:48.0621055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:09:48.0780095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:09:48.6791357Z ok (3.637s) 2022-05-18T04:09:48.6791800Z 2022-05-18T04:09:48.6792903Z ---------------------------------------------------------------------- 2022-05-18T04:09:48.6793284Z Ran 1 test in 3.637s 2022-05-18T04:09:48.6793449Z 2022-05-18T04:09:48.6793543Z OK 2022-05-18T04:09:48.6793678Z 2022-05-18T04:09:48.6793793Z Generating XML reports... 2022-05-18T04:09:48.6835230Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040945.xml 2022-05-18T04:09:49.8513814Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp770lpu_h 2022-05-18T04:09:49.8515247Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp770lpu_h/_remote_module_non_scriptable.py 2022-05-18T04:09:50.2663556Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:09:50.2678762Z 2022-05-18T04:09:50.2679069Z Running tests... 2022-05-18T04:09:50.2679532Z ---------------------------------------------------------------------- 2022-05-18T04:09:51.8317646Z test_device_map_cpu_to_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:09:51.8705593Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9010 2022-05-18T04:09:51.8808439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9011 2022-05-18T04:09:51.8913651Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 9012 2022-05-18T04:09:51.9020586Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 9013 2022-05-18T04:09:52.8333717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpofyvldlk 2022-05-18T04:09:52.8334712Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpofyvldlk/_remote_module_non_scriptable.py 2022-05-18T04:09:52.8722080Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp12xprdi8 2022-05-18T04:09:52.8724884Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp12xprdi8/_remote_module_non_scriptable.py 2022-05-18T04:09:52.8916886Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa113zdwj 2022-05-18T04:09:52.8919539Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa113zdwj/_remote_module_non_scriptable.py 2022-05-18T04:09:52.9024970Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgwy0robo 2022-05-18T04:09:52.9028168Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgwy0robo/_remote_module_non_scriptable.py 2022-05-18T04:09:53.2318982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:09:53.2719865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:09:53.2916298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:09:53.3116455Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:09:56.6151163Z ok (6.347s) 2022-05-18T04:09:56.6151389Z 2022-05-18T04:09:56.6151798Z ---------------------------------------------------------------------- 2022-05-18T04:09:56.6152122Z Ran 1 test in 6.347s 2022-05-18T04:09:56.6152290Z 2022-05-18T04:09:56.6152395Z OK 2022-05-18T04:09:56.6152529Z 2022-05-18T04:09:56.6152663Z Generating XML reports... 2022-05-18T04:09:56.6195498Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040950.xml 2022-05-18T04:09:57.7667467Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdxddm6g3 2022-05-18T04:09:57.7668317Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdxddm6g3/_remote_module_non_scriptable.py 2022-05-18T04:09:58.1674345Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:09:58.1688201Z 2022-05-18T04:09:58.1688441Z Running tests... 2022-05-18T04:09:58.1689278Z ---------------------------------------------------------------------- 2022-05-18T04:09:59.7425157Z test_device_map_cpu_to_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:09:59.7806627Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9357 2022-05-18T04:09:59.7913239Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9358 2022-05-18T04:09:59.8020024Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 9359 2022-05-18T04:09:59.8126292Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 9360 2022-05-18T04:10:00.6989435Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2ykzun4v 2022-05-18T04:10:00.6990657Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2ykzun4v/_remote_module_non_scriptable.py 2022-05-18T04:10:00.7155384Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi_4x7uca 2022-05-18T04:10:00.7158004Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi_4x7uca/_remote_module_non_scriptable.py 2022-05-18T04:10:00.7618365Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmmhnmqbu 2022-05-18T04:10:00.7620964Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmmhnmqbu/_remote_module_non_scriptable.py 2022-05-18T04:10:00.7755376Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnxv55rv8 2022-05-18T04:10:00.7758267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnxv55rv8/_remote_module_non_scriptable.py 2022-05-18T04:10:01.1113162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:01.1165460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:10:01.1670827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:01.1891533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:04.5256499Z ok (6.356s) 2022-05-18T04:10:04.5256721Z 2022-05-18T04:10:04.5257129Z ---------------------------------------------------------------------- 2022-05-18T04:10:04.5257471Z Ran 1 test in 6.357s 2022-05-18T04:10:04.5257643Z 2022-05-18T04:10:04.5257741Z OK 2022-05-18T04:10:04.5257881Z 2022-05-18T04:10:04.5258011Z Generating XML reports... 2022-05-18T04:10:04.5301479Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040958.xml 2022-05-18T04:10:05.6990332Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyz8zrozn 2022-05-18T04:10:05.6991800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyz8zrozn/_remote_module_non_scriptable.py 2022-05-18T04:10:06.1098999Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:10:06.1114616Z 2022-05-18T04:10:06.1115041Z Running tests... 2022-05-18T04:10:06.1115551Z ---------------------------------------------------------------------- 2022-05-18T04:10:07.6979503Z test_device_map_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:10:07.7360404Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9704 2022-05-18T04:10:07.7461845Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9705 2022-05-18T04:10:07.7567941Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 9706 2022-05-18T04:10:07.7672818Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 9707 2022-05-18T04:10:08.6380739Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp842ijyla 2022-05-18T04:10:08.6381916Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp842ijyla/_remote_module_non_scriptable.py 2022-05-18T04:10:08.6496086Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgvl35aww 2022-05-18T04:10:08.6498283Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgvl35aww/_remote_module_non_scriptable.py 2022-05-18T04:10:08.6702577Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwfy43t5z 2022-05-18T04:10:08.6704765Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwfy43t5z/_remote_module_non_scriptable.py 2022-05-18T04:10:08.7181967Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdy60egak 2022-05-18T04:10:08.7184170Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdy60egak/_remote_module_non_scriptable.py 2022-05-18T04:10:09.0549863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:09.0574847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:10:09.0714559Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:09.1184880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:12.4803994Z ok (6.369s) 2022-05-18T04:10:12.4804587Z 2022-05-18T04:10:12.4805235Z ---------------------------------------------------------------------- 2022-05-18T04:10:12.4805950Z Ran 1 test in 6.369s 2022-05-18T04:10:12.4806118Z 2022-05-18T04:10:12.4806214Z OK 2022-05-18T04:10:12.4806350Z 2022-05-18T04:10:12.4808632Z Generating XML reports... 2022-05-18T04:10:12.4848728Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041006.xml 2022-05-18T04:10:13.6305952Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplnrohgqx 2022-05-18T04:10:13.6307149Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplnrohgqx/_remote_module_non_scriptable.py 2022-05-18T04:10:14.0289803Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:10:14.0303552Z 2022-05-18T04:10:14.0303922Z Running tests... 2022-05-18T04:10:14.0304441Z ---------------------------------------------------------------------- 2022-05-18T04:10:15.5716457Z test_device_map_gpu_default_to_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:10:15.6096893Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10047 2022-05-18T04:10:15.6199041Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10048 2022-05-18T04:10:15.6302537Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10049 2022-05-18T04:10:15.6407837Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 10050 2022-05-18T04:10:16.6021093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpef0wbugg 2022-05-18T04:10:16.6022786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpef0wbugg/_remote_module_non_scriptable.py 2022-05-18T04:10:16.6048415Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps1olx1rz 2022-05-18T04:10:16.6051427Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps1olx1rz/_remote_module_non_scriptable.py 2022-05-18T04:10:16.6074807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5el2l3tt 2022-05-18T04:10:16.6077603Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5el2l3tt/_remote_module_non_scriptable.py 2022-05-18T04:10:16.6087927Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7bz1yyx4 2022-05-18T04:10:16.6090821Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7bz1yyx4/_remote_module_non_scriptable.py 2022-05-18T04:10:17.0090898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:17.0188936Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:17.0198494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:10:17.0220091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:22.5585595Z ok (8.528s) 2022-05-18T04:10:22.5587974Z 2022-05-18T04:10:22.5589011Z ---------------------------------------------------------------------- 2022-05-18T04:10:22.5589633Z Ran 1 test in 8.528s 2022-05-18T04:10:22.5589907Z 2022-05-18T04:10:22.5590084Z OK 2022-05-18T04:10:22.5590316Z 2022-05-18T04:10:22.5590537Z Generating XML reports... 2022-05-18T04:10:22.5633127Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041014.xml 2022-05-18T04:10:23.7133197Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmfkqi2dx 2022-05-18T04:10:23.7134026Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmfkqi2dx/_remote_module_non_scriptable.py 2022-05-18T04:10:24.1150614Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:10:24.1164821Z 2022-05-18T04:10:24.1165193Z Running tests... 2022-05-18T04:10:24.1165670Z ---------------------------------------------------------------------- 2022-05-18T04:10:25.6752257Z test_device_map_gpu_mixed_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:10:25.7135700Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10402 2022-05-18T04:10:25.7242552Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10403 2022-05-18T04:10:25.7351289Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10404 2022-05-18T04:10:25.7460811Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 10405 2022-05-18T04:10:26.6901910Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo1yrz6l9 2022-05-18T04:10:26.6902819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo1yrz6l9/_remote_module_non_scriptable.py 2022-05-18T04:10:26.6912151Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpux0a5rvi 2022-05-18T04:10:26.6914878Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpux0a5rvi/_remote_module_non_scriptable.py 2022-05-18T04:10:26.7141041Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6gshamvs 2022-05-18T04:10:26.7143747Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6gshamvs/_remote_module_non_scriptable.py 2022-05-18T04:10:26.7207590Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9xaokbxh 2022-05-18T04:10:26.7210391Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9xaokbxh/_remote_module_non_scriptable.py 2022-05-18T04:10:27.0936994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:27.0946750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:27.1172894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:27.1307935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:10:32.6642654Z ok (8.547s) 2022-05-18T04:10:32.6643088Z 2022-05-18T04:10:32.6643516Z ---------------------------------------------------------------------- 2022-05-18T04:10:32.6643851Z Ran 1 test in 8.548s 2022-05-18T04:10:32.6644025Z 2022-05-18T04:10:32.6644121Z OK 2022-05-18T04:10:32.6644257Z 2022-05-18T04:10:32.6644396Z Generating XML reports... 2022-05-18T04:10:32.6688285Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041024.xml 2022-05-18T04:10:33.8417977Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkmcd6bgl 2022-05-18T04:10:33.8418659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkmcd6bgl/_remote_module_non_scriptable.py 2022-05-18T04:10:34.2436390Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:10:34.2450653Z 2022-05-18T04:10:34.2450871Z Running tests... 2022-05-18T04:10:34.2451312Z ---------------------------------------------------------------------- 2022-05-18T04:10:35.7798576Z test_device_map_gpu_mixed_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:10:35.8180138Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10749 2022-05-18T04:10:35.8285615Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10750 2022-05-18T04:10:35.8392537Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10751 2022-05-18T04:10:35.8498810Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 10752 2022-05-18T04:10:36.7268821Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpngufhhnn 2022-05-18T04:10:36.7270350Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpngufhhnn/_remote_module_non_scriptable.py 2022-05-18T04:10:36.7729543Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpay271ytx 2022-05-18T04:10:36.7731925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpay271ytx/_remote_module_non_scriptable.py 2022-05-18T04:10:36.7876275Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb71ta7xr 2022-05-18T04:10:36.7878978Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb71ta7xr/_remote_module_non_scriptable.py 2022-05-18T04:10:36.8113405Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw_78u5rd 2022-05-18T04:10:36.8116189Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw_78u5rd/_remote_module_non_scriptable.py 2022-05-18T04:10:37.1241655Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:37.1801891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:10:37.1981451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:37.2196461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:42.7706165Z ok (8.525s) 2022-05-18T04:10:42.7706517Z 2022-05-18T04:10:42.7706998Z ---------------------------------------------------------------------- 2022-05-18T04:10:42.7707345Z Ran 1 test in 8.526s 2022-05-18T04:10:42.7707493Z 2022-05-18T04:10:42.7707590Z OK 2022-05-18T04:10:42.7707725Z 2022-05-18T04:10:42.7707860Z Generating XML reports... 2022-05-18T04:10:42.7750386Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041034.xml 2022-05-18T04:10:43.9401537Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxsrcowuz 2022-05-18T04:10:43.9402562Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxsrcowuz/_remote_module_non_scriptable.py 2022-05-18T04:10:44.3469914Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:10:44.3484770Z 2022-05-18T04:10:44.3485227Z Running tests... 2022-05-18T04:10:44.3485717Z ---------------------------------------------------------------------- 2022-05-18T04:10:45.9110298Z test_device_map_gpu_mixed_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:10:45.9489300Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11096 2022-05-18T04:10:45.9595220Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11097 2022-05-18T04:10:45.9703309Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11098 2022-05-18T04:10:45.9809109Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 11099 2022-05-18T04:10:46.9262041Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjrs_o_9 2022-05-18T04:10:46.9263154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjrs_o_9/_remote_module_non_scriptable.py 2022-05-18T04:10:46.9293938Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsdst65l1 2022-05-18T04:10:46.9297069Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsdst65l1/_remote_module_non_scriptable.py 2022-05-18T04:10:46.9682993Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfe1awwhd 2022-05-18T04:10:46.9685005Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfe1awwhd/_remote_module_non_scriptable.py 2022-05-18T04:10:46.9723897Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7lj6bu6q 2022-05-18T04:10:46.9726152Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7lj6bu6q/_remote_module_non_scriptable.py 2022-05-18T04:10:47.3328503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:47.3375466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:47.3737035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:47.3838540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:10:52.8990218Z ok (8.550s) 2022-05-18T04:10:52.8990604Z 2022-05-18T04:10:52.8991111Z ---------------------------------------------------------------------- 2022-05-18T04:10:52.8991459Z Ran 1 test in 8.550s 2022-05-18T04:10:52.8991625Z 2022-05-18T04:10:52.8991719Z OK 2022-05-18T04:10:52.8991835Z 2022-05-18T04:10:52.8991973Z Generating XML reports... 2022-05-18T04:10:52.9035207Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041044.xml 2022-05-18T04:10:54.0571597Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp1e6ik0v 2022-05-18T04:10:54.0572439Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp1e6ik0v/_remote_module_non_scriptable.py 2022-05-18T04:10:54.4696341Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:10:54.4711307Z 2022-05-18T04:10:54.4711451Z Running tests... 2022-05-18T04:10:54.4712117Z ---------------------------------------------------------------------- 2022-05-18T04:10:56.0593513Z test_device_map_gpu_mixed_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:10:56.0982516Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11443 2022-05-18T04:10:56.1087675Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11444 2022-05-18T04:10:56.1195883Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11445 2022-05-18T04:10:56.1303267Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 11446 2022-05-18T04:10:57.0342342Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfwurfk_h 2022-05-18T04:10:57.0343512Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfwurfk_h/_remote_module_non_scriptable.py 2022-05-18T04:10:57.0656902Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcolnsbde 2022-05-18T04:10:57.0659754Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcolnsbde/_remote_module_non_scriptable.py 2022-05-18T04:10:57.0679870Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnx06gw_2 2022-05-18T04:10:57.0683235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnx06gw_2/_remote_module_non_scriptable.py 2022-05-18T04:10:57.0758859Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8nklnd4e 2022-05-18T04:10:57.0761473Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8nklnd4e/_remote_module_non_scriptable.py 2022-05-18T04:10:57.4345582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:10:57.4677821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:10:57.4791459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:10:57.4794780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:03.0505500Z ok (8.579s) 2022-05-18T04:11:03.0505720Z 2022-05-18T04:11:03.0506136Z ---------------------------------------------------------------------- 2022-05-18T04:11:03.0506457Z Ran 1 test in 8.579s 2022-05-18T04:11:03.0506624Z 2022-05-18T04:11:03.0506740Z OK 2022-05-18T04:11:03.0506877Z 2022-05-18T04:11:03.0507010Z Generating XML reports... 2022-05-18T04:11:03.0549857Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041054.xml 2022-05-18T04:11:04.2236637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2_xgk5iw 2022-05-18T04:11:04.2237832Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2_xgk5iw/_remote_module_non_scriptable.py 2022-05-18T04:11:04.6333561Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:11:04.6349014Z 2022-05-18T04:11:04.6349161Z Running tests... 2022-05-18T04:11:04.6349818Z ---------------------------------------------------------------------- 2022-05-18T04:11:06.2174205Z test_device_map_gpu_mixed_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:11:06.2563864Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11790 2022-05-18T04:11:06.2670706Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11791 2022-05-18T04:11:06.2776034Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11792 2022-05-18T04:11:06.2883042Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 11793 2022-05-18T04:11:07.1831093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq42er1vo 2022-05-18T04:11:07.1831994Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq42er1vo/_remote_module_non_scriptable.py 2022-05-18T04:11:07.2059540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc_97muoe 2022-05-18T04:11:07.2062410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc_97muoe/_remote_module_non_scriptable.py 2022-05-18T04:11:07.2138527Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj1tv4rp9 2022-05-18T04:11:07.2141267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj1tv4rp9/_remote_module_non_scriptable.py 2022-05-18T04:11:07.2602595Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpier5tb7a 2022-05-18T04:11:07.2605364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpier5tb7a/_remote_module_non_scriptable.py 2022-05-18T04:11:07.5910029Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:07.6165677Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:11:07.6196892Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:11:07.6610968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:11:13.2064081Z ok (8.571s) 2022-05-18T04:11:13.2064311Z 2022-05-18T04:11:13.2065053Z ---------------------------------------------------------------------- 2022-05-18T04:11:13.2065415Z Ran 1 test in 8.571s 2022-05-18T04:11:13.2067940Z 2022-05-18T04:11:13.2068377Z OK 2022-05-18T04:11:13.2068544Z 2022-05-18T04:11:13.2068684Z Generating XML reports... 2022-05-18T04:11:13.2109756Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041104.xml 2022-05-18T04:11:14.3859733Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9gflzx0j 2022-05-18T04:11:14.3860624Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9gflzx0j/_remote_module_non_scriptable.py 2022-05-18T04:11:14.8015366Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:11:14.8030318Z 2022-05-18T04:11:14.8030656Z Running tests... 2022-05-18T04:11:14.8031135Z ---------------------------------------------------------------------- 2022-05-18T04:11:16.3922350Z test_device_map_gpu_mixed_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:11:16.4311778Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12145 2022-05-18T04:11:16.4416005Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12146 2022-05-18T04:11:16.4521427Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 12147 2022-05-18T04:11:16.4628708Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 12148 2022-05-18T04:11:17.3614421Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl6rx9vg1 2022-05-18T04:11:17.3615328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl6rx9vg1/_remote_module_non_scriptable.py 2022-05-18T04:11:17.3695552Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzezuf59w 2022-05-18T04:11:17.3698091Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzezuf59w/_remote_module_non_scriptable.py 2022-05-18T04:11:17.3934440Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvgineib7 2022-05-18T04:11:17.3937559Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvgineib7/_remote_module_non_scriptable.py 2022-05-18T04:11:17.4289594Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps4orsusl 2022-05-18T04:11:17.4292772Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps4orsusl/_remote_module_non_scriptable.py 2022-05-18T04:11:17.7639875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:11:17.7797508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:11:17.7937897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:17.8308392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:11:23.3811970Z ok (8.578s) 2022-05-18T04:11:23.3812246Z 2022-05-18T04:11:23.3812645Z ---------------------------------------------------------------------- 2022-05-18T04:11:23.3812991Z Ran 1 test in 8.578s 2022-05-18T04:11:23.3813159Z 2022-05-18T04:11:23.3814391Z OK 2022-05-18T04:11:23.3814730Z 2022-05-18T04:11:23.3815050Z Generating XML reports... 2022-05-18T04:11:23.3856080Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041114.xml 2022-05-18T04:11:24.5515429Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_g4gib5p 2022-05-18T04:11:24.5516302Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_g4gib5p/_remote_module_non_scriptable.py 2022-05-18T04:11:24.9520156Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:11:24.9534554Z 2022-05-18T04:11:24.9534976Z Running tests... 2022-05-18T04:11:24.9535836Z ---------------------------------------------------------------------- 2022-05-18T04:11:26.5014353Z test_device_map_gpu_mixed_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:11:26.5397382Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12500 2022-05-18T04:11:26.5501243Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12501 2022-05-18T04:11:26.5606146Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 12502 2022-05-18T04:11:26.5713017Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 12503 2022-05-18T04:11:27.5045166Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcu0blxvg 2022-05-18T04:11:27.5046272Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcu0blxvg/_remote_module_non_scriptable.py 2022-05-18T04:11:27.5110554Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2ovsqbxd 2022-05-18T04:11:27.5113348Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2ovsqbxd/_remote_module_non_scriptable.py 2022-05-18T04:11:27.5114266Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp90zru0u0 2022-05-18T04:11:27.5117519Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp90zru0u0/_remote_module_non_scriptable.py 2022-05-18T04:11:27.5376512Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1jcfw42g 2022-05-18T04:11:27.5379218Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1jcfw42g/_remote_module_non_scriptable.py 2022-05-18T04:11:27.9064981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:27.9138291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:11:27.9151910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:11:27.9484106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:11:33.5896935Z ok (8.636s) 2022-05-18T04:11:33.5897169Z 2022-05-18T04:11:33.5897577Z ---------------------------------------------------------------------- 2022-05-18T04:11:33.5897916Z Ran 1 test in 8.636s 2022-05-18T04:11:33.5898085Z 2022-05-18T04:11:33.5898184Z OK 2022-05-18T04:11:33.5898345Z 2022-05-18T04:11:33.5898481Z Generating XML reports... 2022-05-18T04:11:33.5941642Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041124.xml 2022-05-18T04:11:34.7652209Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptbvnh7o9 2022-05-18T04:11:34.7653061Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptbvnh7o9/_remote_module_non_scriptable.py 2022-05-18T04:11:35.1763413Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:11:35.1778000Z 2022-05-18T04:11:35.1778232Z Running tests... 2022-05-18T04:11:35.1778646Z ---------------------------------------------------------------------- 2022-05-18T04:11:36.7435155Z test_device_map_gpu_mixed_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:11:36.7819638Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12855 2022-05-18T04:11:36.7925204Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12856 2022-05-18T04:11:36.8034444Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 12857 2022-05-18T04:11:36.8142936Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 12858 2022-05-18T04:11:37.7130168Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_0aorhy4 2022-05-18T04:11:37.7131267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_0aorhy4/_remote_module_non_scriptable.py 2022-05-18T04:11:37.7140505Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9na3c6fo 2022-05-18T04:11:37.7143455Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9na3c6fo/_remote_module_non_scriptable.py 2022-05-18T04:11:37.7144007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjcle4a8f 2022-05-18T04:11:37.7147018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjcle4a8f/_remote_module_non_scriptable.py 2022-05-18T04:11:37.7611384Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi6t9aq5c 2022-05-18T04:11:37.7613775Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi6t9aq5c/_remote_module_non_scriptable.py 2022-05-18T04:11:38.1176618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:11:38.1217800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:11:38.1227389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:11:38.1724890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:43.7328262Z ok (8.555s) 2022-05-18T04:11:43.7329352Z 2022-05-18T04:11:43.7329764Z ---------------------------------------------------------------------- 2022-05-18T04:11:43.7330481Z Ran 1 test in 8.555s 2022-05-18T04:11:43.7330649Z 2022-05-18T04:11:43.7330745Z OK 2022-05-18T04:11:43.7330864Z 2022-05-18T04:11:43.7331001Z Generating XML reports... 2022-05-18T04:11:43.7385455Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041135.xml 2022-05-18T04:11:44.9002038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf9th4_nu 2022-05-18T04:11:44.9003068Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf9th4_nu/_remote_module_non_scriptable.py 2022-05-18T04:11:45.3152746Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:11:45.3167111Z 2022-05-18T04:11:45.3167437Z Running tests... 2022-05-18T04:11:45.3167902Z ---------------------------------------------------------------------- 2022-05-18T04:11:46.9219297Z test_device_map_gpu_mixed_self_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:11:46.9610767Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13210 2022-05-18T04:11:46.9716859Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13211 2022-05-18T04:11:46.9825358Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 13212 2022-05-18T04:11:46.9934362Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 13213 2022-05-18T04:11:47.8596598Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0ps1__v4 2022-05-18T04:11:47.8597765Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0ps1__v4/_remote_module_non_scriptable.py 2022-05-18T04:11:47.8858503Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9bemxdjt 2022-05-18T04:11:47.8860364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9bemxdjt/_remote_module_non_scriptable.py 2022-05-18T04:11:47.9016321Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp70dgdby1 2022-05-18T04:11:47.9018440Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp70dgdby1/_remote_module_non_scriptable.py 2022-05-18T04:11:47.9421084Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1rkw03zx 2022-05-18T04:11:47.9423397Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1rkw03zx/_remote_module_non_scriptable.py 2022-05-18T04:11:48.2630055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:11:48.2854631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:11:48.3001596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:48.3492534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:11:53.9138670Z ok (8.597s) 2022-05-18T04:11:53.9138996Z 2022-05-18T04:11:53.9139850Z ---------------------------------------------------------------------- 2022-05-18T04:11:53.9140328Z Ran 1 test in 8.597s 2022-05-18T04:11:53.9140497Z 2022-05-18T04:11:53.9140592Z OK 2022-05-18T04:11:53.9140728Z 2022-05-18T04:11:53.9140842Z Generating XML reports... 2022-05-18T04:11:53.9183848Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041145.xml 2022-05-18T04:11:55.0814003Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx_xgu4s0 2022-05-18T04:11:55.0815151Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx_xgu4s0/_remote_module_non_scriptable.py 2022-05-18T04:11:55.4874068Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:11:55.4888529Z 2022-05-18T04:11:55.4888649Z Running tests... 2022-05-18T04:11:55.4889327Z ---------------------------------------------------------------------- 2022-05-18T04:11:57.0584631Z test_device_map_gpu_mixed_self_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:11:57.0968475Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13557 2022-05-18T04:11:57.1074571Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13558 2022-05-18T04:11:57.1182075Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 13559 2022-05-18T04:11:57.1289619Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 13560 2022-05-18T04:11:58.0128878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5fg53bui 2022-05-18T04:11:58.0129771Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5fg53bui/_remote_module_non_scriptable.py 2022-05-18T04:11:58.0172170Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphscyhf6u 2022-05-18T04:11:58.0175117Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphscyhf6u/_remote_module_non_scriptable.py 2022-05-18T04:11:58.0191728Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptgzp5t9v 2022-05-18T04:11:58.0194538Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptgzp5t9v/_remote_module_non_scriptable.py 2022-05-18T04:11:58.0221622Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxk4ssxpx 2022-05-18T04:11:58.0224813Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxk4ssxpx/_remote_module_non_scriptable.py 2022-05-18T04:11:58.4149348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:11:58.4199968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:11:58.4367579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:11:58.4383467Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:04.0469575Z ok (8.558s) 2022-05-18T04:12:04.0469798Z 2022-05-18T04:12:04.0470204Z ---------------------------------------------------------------------- 2022-05-18T04:12:04.0470549Z Ran 1 test in 8.558s 2022-05-18T04:12:04.0470716Z 2022-05-18T04:12:04.0470792Z OK 2022-05-18T04:12:04.0470929Z 2022-05-18T04:12:04.0471067Z Generating XML reports... 2022-05-18T04:12:04.0513474Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041155.xml 2022-05-18T04:12:05.2058506Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp47fwolfl 2022-05-18T04:12:05.2059552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp47fwolfl/_remote_module_non_scriptable.py 2022-05-18T04:12:05.6135854Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:12:05.6150522Z 2022-05-18T04:12:05.6150783Z Running tests... 2022-05-18T04:12:05.6151234Z ---------------------------------------------------------------------- 2022-05-18T04:12:07.1924591Z test_device_map_gpu_mixed_self_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:12:07.2312904Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13904 2022-05-18T04:12:07.2417683Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13905 2022-05-18T04:12:07.2526096Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 13906 2022-05-18T04:12:07.2633089Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 13907 2022-05-18T04:12:08.1401183Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7jby3yv6 2022-05-18T04:12:08.1402351Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7jby3yv6/_remote_module_non_scriptable.py 2022-05-18T04:12:08.1705154Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7g97yzhg 2022-05-18T04:12:08.1707853Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7g97yzhg/_remote_module_non_scriptable.py 2022-05-18T04:12:08.2043042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvqcfwcx7 2022-05-18T04:12:08.2045775Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvqcfwcx7/_remote_module_non_scriptable.py 2022-05-18T04:12:08.2049720Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_ww0vu0m 2022-05-18T04:12:08.2052990Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_ww0vu0m/_remote_module_non_scriptable.py 2022-05-18T04:12:08.5494329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:12:08.5704743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:08.6036797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:12:08.6055720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:12:14.1814996Z ok (8.566s) 2022-05-18T04:12:14.1815394Z 2022-05-18T04:12:14.1815917Z ---------------------------------------------------------------------- 2022-05-18T04:12:14.1816307Z Ran 1 test in 8.566s 2022-05-18T04:12:14.1816460Z 2022-05-18T04:12:14.1816559Z OK 2022-05-18T04:12:14.1816696Z 2022-05-18T04:12:14.1816832Z Generating XML reports... 2022-05-18T04:12:14.1861706Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041205.xml 2022-05-18T04:12:15.3651115Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9x8onh19 2022-05-18T04:12:15.3652472Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9x8onh19/_remote_module_non_scriptable.py 2022-05-18T04:12:15.7750944Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:12:15.7765842Z 2022-05-18T04:12:15.7765986Z Running tests... 2022-05-18T04:12:15.7766670Z ---------------------------------------------------------------------- 2022-05-18T04:12:17.3622928Z test_device_map_gpu_mixed_self_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:12:17.4014523Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14251 2022-05-18T04:12:17.4121739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14252 2022-05-18T04:12:17.4230504Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 14253 2022-05-18T04:12:17.4338615Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 14254 2022-05-18T04:12:18.3010847Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd8kg5w9p 2022-05-18T04:12:18.3011975Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd8kg5w9p/_remote_module_non_scriptable.py 2022-05-18T04:12:18.3434415Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplm4m6ira 2022-05-18T04:12:18.3436122Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplm4m6ira/_remote_module_non_scriptable.py 2022-05-18T04:12:18.3721697Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7geme_nz 2022-05-18T04:12:18.3723991Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7geme_nz/_remote_module_non_scriptable.py 2022-05-18T04:12:18.3784007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfiw34ruf 2022-05-18T04:12:18.3786672Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfiw34ruf/_remote_module_non_scriptable.py 2022-05-18T04:12:18.7038306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:12:18.7409998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:12:18.7789122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:12:18.7851995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:24.3526969Z ok (8.576s) 2022-05-18T04:12:24.3532299Z 2022-05-18T04:12:24.3533056Z ---------------------------------------------------------------------- 2022-05-18T04:12:24.3534269Z Ran 1 test in 8.576s 2022-05-18T04:12:24.3534626Z 2022-05-18T04:12:24.3534800Z OK 2022-05-18T04:12:24.3535057Z 2022-05-18T04:12:24.3535281Z Generating XML reports... 2022-05-18T04:12:24.3578196Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041215.xml 2022-05-18T04:12:25.5406443Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4sbdqe9t 2022-05-18T04:12:25.5407657Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4sbdqe9t/_remote_module_non_scriptable.py 2022-05-18T04:12:25.9538799Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:12:25.9553957Z 2022-05-18T04:12:25.9554273Z Running tests... 2022-05-18T04:12:25.9554719Z ---------------------------------------------------------------------- 2022-05-18T04:12:27.5456496Z test_device_map_gpu_mixed_self_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:12:27.5846235Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14598 2022-05-18T04:12:27.5953815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14599 2022-05-18T04:12:27.6062478Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 14600 2022-05-18T04:12:27.6172971Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 14601 2022-05-18T04:12:28.5100328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpliw_vwid 2022-05-18T04:12:28.5101385Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpliw_vwid/_remote_module_non_scriptable.py 2022-05-18T04:12:28.5587696Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm1wvarel 2022-05-18T04:12:28.5588247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgstnttgj 2022-05-18T04:12:28.5590253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm1wvarel/_remote_module_non_scriptable.py 2022-05-18T04:12:28.5591038Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgstnttgj/_remote_module_non_scriptable.py 2022-05-18T04:12:28.5600430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0bme2hlf 2022-05-18T04:12:28.5603389Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0bme2hlf/_remote_module_non_scriptable.py 2022-05-18T04:12:28.9093732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:12:28.9600327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:12:28.9627681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:28.9648670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:12:34.5374674Z ok (8.582s) 2022-05-18T04:12:34.5374879Z 2022-05-18T04:12:34.5375290Z ---------------------------------------------------------------------- 2022-05-18T04:12:34.5375632Z Ran 1 test in 8.582s 2022-05-18T04:12:34.5375806Z 2022-05-18T04:12:34.5375902Z OK 2022-05-18T04:12:34.5376040Z 2022-05-18T04:12:34.5376174Z Generating XML reports... 2022-05-18T04:12:34.5421060Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041225.xml 2022-05-18T04:12:35.6983148Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu0m5m8ks 2022-05-18T04:12:35.6984333Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu0m5m8ks/_remote_module_non_scriptable.py 2022-05-18T04:12:36.0985835Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:12:36.1000420Z 2022-05-18T04:12:36.1000668Z Running tests... 2022-05-18T04:12:36.1001108Z ---------------------------------------------------------------------- 2022-05-18T04:12:37.6428784Z test_device_map_gpu_mixed_self_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:12:37.6811813Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14945 2022-05-18T04:12:37.6917528Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14946 2022-05-18T04:12:37.7022311Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 14947 2022-05-18T04:12:37.7129864Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 14948 2022-05-18T04:12:38.6235785Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr7e4exmk 2022-05-18T04:12:38.6236883Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr7e4exmk/_remote_module_non_scriptable.py 2022-05-18T04:12:38.6371865Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbkxr096f 2022-05-18T04:12:38.6374275Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbkxr096f/_remote_module_non_scriptable.py 2022-05-18T04:12:38.6471447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkfzlm13y 2022-05-18T04:12:38.6474264Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkfzlm13y/_remote_module_non_scriptable.py 2022-05-18T04:12:38.6715785Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkjxr0roa 2022-05-18T04:12:38.6718607Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkjxr0roa/_remote_module_non_scriptable.py 2022-05-18T04:12:39.0222589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:39.0356066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:12:39.0467595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:12:39.0848290Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:12:44.6339173Z ok (8.534s) 2022-05-18T04:12:44.6339398Z 2022-05-18T04:12:44.6340132Z ---------------------------------------------------------------------- 2022-05-18T04:12:44.6340499Z Ran 1 test in 8.534s 2022-05-18T04:12:44.6340667Z 2022-05-18T04:12:44.6341691Z OK 2022-05-18T04:12:44.6341888Z 2022-05-18T04:12:44.6342280Z Generating XML reports... 2022-05-18T04:12:44.6383215Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041236.xml 2022-05-18T04:12:45.7877512Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpocehhj6e 2022-05-18T04:12:45.7878649Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpocehhj6e/_remote_module_non_scriptable.py 2022-05-18T04:12:46.1882073Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:12:46.1898059Z 2022-05-18T04:12:46.1898287Z Running tests... 2022-05-18T04:12:46.1898736Z ---------------------------------------------------------------------- 2022-05-18T04:12:47.7442020Z test_device_map_gpu_mixed_self_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:12:47.7823225Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15292 2022-05-18T04:12:47.7932154Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15293 2022-05-18T04:12:47.8039618Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 15294 2022-05-18T04:12:47.8148315Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 15295 2022-05-18T04:12:48.7086459Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxird7xrm 2022-05-18T04:12:48.7087364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxird7xrm/_remote_module_non_scriptable.py 2022-05-18T04:12:48.7383309Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_egz3y4j 2022-05-18T04:12:48.7386013Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_egz3y4j/_remote_module_non_scriptable.py 2022-05-18T04:12:48.7711200Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcllspfr1 2022-05-18T04:12:48.7713915Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcllspfr1/_remote_module_non_scriptable.py 2022-05-18T04:12:48.7756801Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpextsvs37 2022-05-18T04:12:48.7759638Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpextsvs37/_remote_module_non_scriptable.py 2022-05-18T04:12:49.1073053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:12:49.1376555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:12:49.1818022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:12:49.1835121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:54.7333804Z ok (8.543s) 2022-05-18T04:12:54.7334159Z 2022-05-18T04:12:54.7334769Z ---------------------------------------------------------------------- 2022-05-18T04:12:54.7335260Z Ran 1 test in 8.544s 2022-05-18T04:12:54.7335439Z 2022-05-18T04:12:54.7335535Z OK 2022-05-18T04:12:54.7335673Z 2022-05-18T04:12:54.7335809Z Generating XML reports... 2022-05-18T04:12:54.7378396Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041246.xml 2022-05-18T04:12:55.8909886Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8oqz0n8z 2022-05-18T04:12:55.8910803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8oqz0n8z/_remote_module_non_scriptable.py 2022-05-18T04:12:56.2905241Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:12:56.2920225Z 2022-05-18T04:12:56.2920473Z Running tests... 2022-05-18T04:12:56.2921166Z ---------------------------------------------------------------------- 2022-05-18T04:12:57.8324047Z test_device_map_gpu_mixed_self_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:12:57.8711344Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15639 2022-05-18T04:12:57.8817496Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15640 2022-05-18T04:12:57.8924445Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 15641 2022-05-18T04:12:57.9031541Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 15642 2022-05-18T04:12:58.8404113Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoeog6gdw 2022-05-18T04:12:58.8405042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoeog6gdw/_remote_module_non_scriptable.py 2022-05-18T04:12:58.8506359Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnrtuofeb 2022-05-18T04:12:58.8509566Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnrtuofeb/_remote_module_non_scriptable.py 2022-05-18T04:12:58.8671728Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxsw3szsy 2022-05-18T04:12:58.8674847Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxsw3szsy/_remote_module_non_scriptable.py 2022-05-18T04:12:58.8675627Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2q8e99em 2022-05-18T04:12:58.8677911Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2q8e99em/_remote_module_non_scriptable.py 2022-05-18T04:12:59.2456742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:12:59.2660472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:12:59.2705571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:12:59.2758575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:04.8229896Z ok (8.531s) 2022-05-18T04:13:04.8230104Z 2022-05-18T04:13:04.8230479Z ---------------------------------------------------------------------- 2022-05-18T04:13:04.8230844Z Ran 1 test in 8.531s 2022-05-18T04:13:04.8231011Z 2022-05-18T04:13:04.8231104Z OK 2022-05-18T04:13:04.8231262Z 2022-05-18T04:13:04.8231395Z Generating XML reports... 2022-05-18T04:13:04.8276231Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041256.xml 2022-05-18T04:13:06.0022170Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9nkx_g7i 2022-05-18T04:13:06.0023316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9nkx_g7i/_remote_module_non_scriptable.py 2022-05-18T04:13:06.4160864Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:13:06.4175640Z 2022-05-18T04:13:06.4175875Z Running tests... 2022-05-18T04:13:06.4176308Z ---------------------------------------------------------------------- 2022-05-18T04:13:07.9955224Z test_device_map_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:13:08.0346975Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15986 2022-05-18T04:13:08.0451839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15987 2022-05-18T04:13:08.0560768Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 15988 2022-05-18T04:13:08.0670519Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 15989 2022-05-18T04:13:08.9391500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpktphussd 2022-05-18T04:13:08.9392531Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpktphussd/_remote_module_non_scriptable.py 2022-05-18T04:13:08.9472493Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu1jz5vs4 2022-05-18T04:13:08.9475054Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu1jz5vs4/_remote_module_non_scriptable.py 2022-05-18T04:13:08.9826576Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk_tta38i 2022-05-18T04:13:08.9829122Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk_tta38i/_remote_module_non_scriptable.py 2022-05-18T04:13:09.0031373Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbfw4we2w 2022-05-18T04:13:09.0033886Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbfw4we2w/_remote_module_non_scriptable.py 2022-05-18T04:13:09.3403051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:13:09.3654693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:09.3832268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:13:09.3991606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:13:12.6813533Z ok (6.263s) 2022-05-18T04:13:12.6813879Z 2022-05-18T04:13:12.6814550Z ---------------------------------------------------------------------- 2022-05-18T04:13:12.6815322Z Ran 1 test in 6.264s 2022-05-18T04:13:12.6815488Z 2022-05-18T04:13:12.6815563Z OK 2022-05-18T04:13:12.6815697Z 2022-05-18T04:13:12.6815826Z Generating XML reports... 2022-05-18T04:13:12.6858899Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041306.xml 2022-05-18T04:13:13.8551065Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpibdaho5z 2022-05-18T04:13:13.8552294Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpibdaho5z/_remote_module_non_scriptable.py 2022-05-18T04:13:14.2641104Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:13:14.2656043Z 2022-05-18T04:13:14.2656276Z Running tests... 2022-05-18T04:13:14.2656715Z ---------------------------------------------------------------------- 2022-05-18T04:13:15.8666751Z test_device_map_gpu_non_default_to_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:13:15.9067216Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16329 2022-05-18T04:13:15.9176796Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16330 2022-05-18T04:13:15.9285584Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 16331 2022-05-18T04:13:15.9400414Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 16332 2022-05-18T04:13:16.8446735Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu7dj_nz5 2022-05-18T04:13:16.8447979Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu7dj_nz5/_remote_module_non_scriptable.py 2022-05-18T04:13:16.8701595Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd0rk2vj6 2022-05-18T04:13:16.8703925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd0rk2vj6/_remote_module_non_scriptable.py 2022-05-18T04:13:16.8773129Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxrf6gy1o 2022-05-18T04:13:16.8776027Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxrf6gy1o/_remote_module_non_scriptable.py 2022-05-18T04:13:16.8797552Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpejcn1y68 2022-05-18T04:13:16.8800071Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpejcn1y68/_remote_module_non_scriptable.py 2022-05-18T04:13:17.2441900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:13:17.2789767Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:13:17.2828025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:13:17.2934513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:22.8581137Z ok (8.592s) 2022-05-18T04:13:22.8581426Z 2022-05-18T04:13:22.8581834Z ---------------------------------------------------------------------- 2022-05-18T04:13:22.8582172Z Ran 1 test in 8.592s 2022-05-18T04:13:22.8582336Z 2022-05-18T04:13:22.8582429Z OK 2022-05-18T04:13:22.8582546Z 2022-05-18T04:13:22.8583542Z Generating XML reports... 2022-05-18T04:13:22.8625719Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041314.xml 2022-05-18T04:13:24.0228774Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcqzhla5d 2022-05-18T04:13:24.0230797Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcqzhla5d/_remote_module_non_scriptable.py 2022-05-18T04:13:24.4339718Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:13:24.4355358Z 2022-05-18T04:13:24.4355589Z Running tests... 2022-05-18T04:13:24.4356029Z ---------------------------------------------------------------------- 2022-05-18T04:13:26.0104031Z test_device_map_gpu_to_cpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:13:26.0497083Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16684 2022-05-18T04:13:26.0605321Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16685 2022-05-18T04:13:26.0713563Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 16686 2022-05-18T04:13:26.0825746Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 16687 2022-05-18T04:13:26.9752354Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5546n92v 2022-05-18T04:13:26.9753375Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5546n92v/_remote_module_non_scriptable.py 2022-05-18T04:13:26.9802333Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpovzybejs 2022-05-18T04:13:26.9805142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpovzybejs/_remote_module_non_scriptable.py 2022-05-18T04:13:26.9842341Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2qo0he0b 2022-05-18T04:13:26.9845173Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2qo0he0b/_remote_module_non_scriptable.py 2022-05-18T04:13:26.9905655Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6or0926p 2022-05-18T04:13:26.9908689Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6or0926p/_remote_module_non_scriptable.py 2022-05-18T04:13:27.3856309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:13:27.3867593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:13:27.3928173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:27.3943932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:13:30.7954093Z ok (6.359s) 2022-05-18T04:13:30.7954321Z 2022-05-18T04:13:30.7954728Z ---------------------------------------------------------------------- 2022-05-18T04:13:30.7955074Z Ran 1 test in 6.360s 2022-05-18T04:13:30.7955220Z 2022-05-18T04:13:30.7955313Z OK 2022-05-18T04:13:30.7955444Z 2022-05-18T04:13:30.7955579Z Generating XML reports... 2022-05-18T04:13:30.7998896Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041324.xml 2022-05-18T04:13:31.9635877Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_xm0un9q 2022-05-18T04:13:31.9636998Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_xm0un9q/_remote_module_non_scriptable.py 2022-05-18T04:13:32.3715048Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:13:32.3729498Z 2022-05-18T04:13:32.3729735Z Running tests... 2022-05-18T04:13:32.3730173Z ---------------------------------------------------------------------- 2022-05-18T04:13:33.9525137Z test_device_map_gpu_to_cpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:13:33.9909291Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17031 2022-05-18T04:13:34.0015529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17032 2022-05-18T04:13:34.0120937Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 17033 2022-05-18T04:13:34.0228813Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 17034 2022-05-18T04:13:34.9747725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcxsb56bw 2022-05-18T04:13:34.9749054Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcxsb56bw/_remote_module_non_scriptable.py 2022-05-18T04:13:34.9838480Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc1zznw01 2022-05-18T04:13:34.9841364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc1zznw01/_remote_module_non_scriptable.py 2022-05-18T04:13:34.9851442Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplfnfg_j6 2022-05-18T04:13:34.9854615Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplfnfg_j6/_remote_module_non_scriptable.py 2022-05-18T04:13:34.9874119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf1_jzej_ 2022-05-18T04:13:34.9876998Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf1_jzej_/_remote_module_non_scriptable.py 2022-05-18T04:13:35.3769136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:35.3905926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:13:35.3935011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:13:35.4032849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:13:38.8362068Z ok (6.463s) 2022-05-18T04:13:38.8362425Z 2022-05-18T04:13:38.8363111Z ---------------------------------------------------------------------- 2022-05-18T04:13:38.8363728Z Ran 1 test in 6.463s 2022-05-18T04:13:38.8364031Z 2022-05-18T04:13:38.8364171Z OK 2022-05-18T04:13:38.8364422Z 2022-05-18T04:13:38.8364664Z Generating XML reports... 2022-05-18T04:13:38.8408608Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041332.xml 2022-05-18T04:13:40.0080670Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9uqiw2dm 2022-05-18T04:13:40.0082588Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9uqiw2dm/_remote_module_non_scriptable.py 2022-05-18T04:13:40.4184552Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:13:40.4199340Z 2022-05-18T04:13:40.4199739Z Running tests... 2022-05-18T04:13:40.4200183Z ---------------------------------------------------------------------- 2022-05-18T04:13:41.9996407Z test_device_maps_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:13:42.0380016Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17378 2022-05-18T04:13:42.0486772Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17379 2022-05-18T04:13:42.0595384Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 17380 2022-05-18T04:13:42.0703875Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 17381 2022-05-18T04:13:43.0607078Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzqia4pbn 2022-05-18T04:13:43.0608106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzqia4pbn/_remote_module_non_scriptable.py 2022-05-18T04:13:43.0624468Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn8x8mifn 2022-05-18T04:13:43.0627747Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn8x8mifn/_remote_module_non_scriptable.py 2022-05-18T04:13:43.0661808Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj2803hnk 2022-05-18T04:13:43.0664820Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj2803hnk/_remote_module_non_scriptable.py 2022-05-18T04:13:43.0897408Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvm5a8yj3 2022-05-18T04:13:43.0900192Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvm5a8yj3/_remote_module_non_scriptable.py 2022-05-18T04:13:43.4756061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:13:43.4787478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:13:43.4792395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:43.4901375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:13:48.9908547Z ok (8.571s) 2022-05-18T04:13:48.9908773Z 2022-05-18T04:13:48.9909207Z ---------------------------------------------------------------------- 2022-05-18T04:13:48.9909550Z Ran 1 test in 8.571s 2022-05-18T04:13:48.9909716Z 2022-05-18T04:13:48.9909812Z OK 2022-05-18T04:13:48.9909934Z 2022-05-18T04:13:48.9910095Z Generating XML reports... 2022-05-18T04:13:48.9953149Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041340.xml 2022-05-18T04:13:50.1513662Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaxom90t3 2022-05-18T04:13:50.1514944Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaxom90t3/_remote_module_non_scriptable.py 2022-05-18T04:13:50.5604305Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:13:50.5619473Z 2022-05-18T04:13:50.5619999Z Running tests... 2022-05-18T04:13:50.5620509Z ---------------------------------------------------------------------- 2022-05-18T04:13:52.1449633Z test_device_maps_in_options (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:13:52.1831150Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17733 2022-05-18T04:13:52.1936646Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17734 2022-05-18T04:13:52.2042184Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 17735 2022-05-18T04:13:52.2149686Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 17736 2022-05-18T04:13:53.1179243Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy26hcamx 2022-05-18T04:13:53.1180350Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy26hcamx/_remote_module_non_scriptable.py 2022-05-18T04:13:53.1736811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp55x2wxyl 2022-05-18T04:13:53.1739109Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp55x2wxyl/_remote_module_non_scriptable.py 2022-05-18T04:13:53.1860305Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprir1l_4g 2022-05-18T04:13:53.1863235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprir1l_4g/_remote_module_non_scriptable.py 2022-05-18T04:13:53.2115185Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp30fy3kz3 2022-05-18T04:13:53.2117987Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp30fy3kz3/_remote_module_non_scriptable.py 2022-05-18T04:13:53.5147475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:13:53.5818673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:13:53.5923880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:13:53.6144720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:13:59.1358203Z ok (8.574s) 2022-05-18T04:13:59.1358440Z 2022-05-18T04:13:59.1358852Z ---------------------------------------------------------------------- 2022-05-18T04:13:59.1359175Z Ran 1 test in 8.574s 2022-05-18T04:13:59.1359363Z 2022-05-18T04:13:59.1359460Z OK 2022-05-18T04:13:59.1359596Z 2022-05-18T04:13:59.1360484Z Generating XML reports... 2022-05-18T04:13:59.1402222Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041350.xml 2022-05-18T04:14:00.3080034Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuutnx2fp 2022-05-18T04:14:00.3081297Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuutnx2fp/_remote_module_non_scriptable.py 2022-05-18T04:14:00.7196271Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:00.7210995Z 2022-05-18T04:14:00.7211366Z Running tests... 2022-05-18T04:14:00.7211886Z ---------------------------------------------------------------------- 2022-05-18T04:14:02.2982098Z test_device_maps_invalid_max_local_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:02.3368622Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18088 2022-05-18T04:14:02.3477695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18089 2022-05-18T04:14:02.3587124Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18090 2022-05-18T04:14:02.3696631Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18091 2022-05-18T04:14:03.3179656Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdhcywssy 2022-05-18T04:14:03.3180814Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbvh5xfv3 2022-05-18T04:14:03.3181881Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdhcywssy/_remote_module_non_scriptable.py 2022-05-18T04:14:03.3182981Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbvh5xfv3/_remote_module_non_scriptable.py 2022-05-18T04:14:03.3214282Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp35_03y7n 2022-05-18T04:14:03.3216422Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp35_03y7n/_remote_module_non_scriptable.py 2022-05-18T04:14:03.3231712Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq5hh6t5b 2022-05-18T04:14:03.3234450Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq5hh6t5b/_remote_module_non_scriptable.py 2022-05-18T04:14:03.7233376Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:03.7245459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:03.7274825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:03.7388836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:03.9753611Z ok (3.254s) 2022-05-18T04:14:03.9753823Z 2022-05-18T04:14:03.9754534Z ---------------------------------------------------------------------- 2022-05-18T04:14:03.9754880Z Ran 1 test in 3.254s 2022-05-18T04:14:03.9755054Z 2022-05-18T04:14:03.9755151Z OK 2022-05-18T04:14:03.9755287Z 2022-05-18T04:14:03.9755428Z Generating XML reports... 2022-05-18T04:14:03.9798666Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041400.xml 2022-05-18T04:14:05.1472226Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsi7jh79j 2022-05-18T04:14:05.1473443Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsi7jh79j/_remote_module_non_scriptable.py 2022-05-18T04:14:05.5628400Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:05.5643498Z 2022-05-18T04:14:05.5644026Z Running tests... 2022-05-18T04:14:05.5644551Z ---------------------------------------------------------------------- 2022-05-18T04:14:07.1399084Z test_device_maps_invalid_max_remote_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:07.1785421Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18271 2022-05-18T04:14:07.1892933Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18272 2022-05-18T04:14:07.2000796Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18273 2022-05-18T04:14:07.2110883Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18274 2022-05-18T04:14:08.1295117Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp725r6b4_ 2022-05-18T04:14:08.1296408Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp725r6b4_/_remote_module_non_scriptable.py 2022-05-18T04:14:08.1583635Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8z1ov45b 2022-05-18T04:14:08.1586035Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8z1ov45b/_remote_module_non_scriptable.py 2022-05-18T04:14:08.1839937Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8mbsoshk 2022-05-18T04:14:08.1841778Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8mbsoshk/_remote_module_non_scriptable.py 2022-05-18T04:14:08.1863878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp07dhvhgn 2022-05-18T04:14:08.1866798Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp07dhvhgn/_remote_module_non_scriptable.py 2022-05-18T04:14:08.5384989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:08.5622560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:08.5843394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:08.5884412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:08.8166389Z ok (3.252s) 2022-05-18T04:14:08.8166661Z 2022-05-18T04:14:08.8167069Z ---------------------------------------------------------------------- 2022-05-18T04:14:08.8167411Z Ran 1 test in 3.252s 2022-05-18T04:14:08.8167577Z 2022-05-18T04:14:08.8167671Z OK 2022-05-18T04:14:08.8167790Z 2022-05-18T04:14:08.8167921Z Generating XML reports... 2022-05-18T04:14:08.8211437Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041405.xml 2022-05-18T04:14:09.9635325Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwr3cpzh5 2022-05-18T04:14:09.9636828Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwr3cpzh5/_remote_module_non_scriptable.py 2022-05-18T04:14:10.3636677Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:10.3650290Z 2022-05-18T04:14:10.3650443Z Running tests... 2022-05-18T04:14:10.3651195Z ---------------------------------------------------------------------- 2022-05-18T04:14:11.9291875Z test_device_maps_invalid_min_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:11.9675521Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18454 2022-05-18T04:14:11.9782001Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18455 2022-05-18T04:14:11.9887570Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18456 2022-05-18T04:14:11.9995072Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18457 2022-05-18T04:14:12.8971635Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnvd3ohmm 2022-05-18T04:14:12.8972810Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnvd3ohmm/_remote_module_non_scriptable.py 2022-05-18T04:14:12.9376755Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe2613cz0 2022-05-18T04:14:12.9377757Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvsxz0tik 2022-05-18T04:14:12.9379378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe2613cz0/_remote_module_non_scriptable.py 2022-05-18T04:14:12.9380456Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvsxz0tik/_remote_module_non_scriptable.py 2022-05-18T04:14:12.9570489Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpueucj_ko 2022-05-18T04:14:12.9572857Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpueucj_ko/_remote_module_non_scriptable.py 2022-05-18T04:14:13.2992909Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:13.3382635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:13.3438733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:13.3696472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:13.6055336Z ok (3.240s) 2022-05-18T04:14:13.6055577Z 2022-05-18T04:14:13.6056196Z ---------------------------------------------------------------------- 2022-05-18T04:14:13.6056529Z Ran 1 test in 3.240s 2022-05-18T04:14:13.6056724Z 2022-05-18T04:14:13.6056816Z OK 2022-05-18T04:14:13.6056949Z 2022-05-18T04:14:13.6057095Z Generating XML reports... 2022-05-18T04:14:13.6099999Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041410.xml 2022-05-18T04:14:14.7634411Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj76v1hdg 2022-05-18T04:14:14.7635393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj76v1hdg/_remote_module_non_scriptable.py 2022-05-18T04:14:15.1636080Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:15.1650469Z 2022-05-18T04:14:15.1650875Z Running tests... 2022-05-18T04:14:15.1651309Z ---------------------------------------------------------------------- 2022-05-18T04:14:16.7354335Z test_device_maps_many_to_one (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:16.7735872Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18625 2022-05-18T04:14:16.7843082Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18626 2022-05-18T04:14:16.7952345Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18627 2022-05-18T04:14:16.8059389Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18628 2022-05-18T04:14:17.6707799Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcjmd5ijf 2022-05-18T04:14:17.6709289Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcjmd5ijf/_remote_module_non_scriptable.py 2022-05-18T04:14:17.6778216Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcxw9knf3 2022-05-18T04:14:17.6781269Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcxw9knf3/_remote_module_non_scriptable.py 2022-05-18T04:14:17.7760164Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_zu35s4d 2022-05-18T04:14:17.7760834Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2qyexw4s 2022-05-18T04:14:17.7761688Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_zu35s4d/_remote_module_non_scriptable.py 2022-05-18T04:14:17.7762242Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2qyexw4s/_remote_module_non_scriptable.py 2022-05-18T04:14:18.0682965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:18.0855448Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:18.1766846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:18.1826731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:18.4116360Z ok (3.246s) 2022-05-18T04:14:18.4116674Z 2022-05-18T04:14:18.4117194Z ---------------------------------------------------------------------- 2022-05-18T04:14:18.4117897Z Ran 1 test in 3.247s 2022-05-18T04:14:18.4118063Z 2022-05-18T04:14:18.4118140Z OK 2022-05-18T04:14:18.4118276Z 2022-05-18T04:14:18.4118410Z Generating XML reports... 2022-05-18T04:14:18.4162090Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041415.xml 2022-05-18T04:14:19.5608582Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8h6s48cz 2022-05-18T04:14:19.5609493Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8h6s48cz/_remote_module_non_scriptable.py 2022-05-18T04:14:19.9596634Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:19.9610929Z 2022-05-18T04:14:19.9611401Z Running tests... 2022-05-18T04:14:19.9611898Z ---------------------------------------------------------------------- 2022-05-18T04:14:21.5086486Z test_device_maps_missing_config (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:21.5471300Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18808 2022-05-18T04:14:21.5577422Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18809 2022-05-18T04:14:21.5686144Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18810 2022-05-18T04:14:21.5794046Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18811 2022-05-18T04:14:22.4739714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv69y8rwv 2022-05-18T04:14:22.4740560Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv69y8rwv/_remote_module_non_scriptable.py 2022-05-18T04:14:22.4905283Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq2q6aahi 2022-05-18T04:14:22.4908184Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq2q6aahi/_remote_module_non_scriptable.py 2022-05-18T04:14:22.5052320Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmperu3lf6g 2022-05-18T04:14:22.5054879Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmperu3lf6g/_remote_module_non_scriptable.py 2022-05-18T04:14:22.5129014Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp51c5tfa6 2022-05-18T04:14:22.5131697Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp51c5tfa6/_remote_module_non_scriptable.py 2022-05-18T04:14:22.8879605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:22.8937540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:22.9058652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:22.9130461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:25.0897626Z ok (5.128s) 2022-05-18T04:14:25.0902704Z 2022-05-18T04:14:25.0903206Z ---------------------------------------------------------------------- 2022-05-18T04:14:25.0903703Z Ran 1 test in 5.129s 2022-05-18T04:14:25.0904207Z 2022-05-18T04:14:25.0904321Z OK 2022-05-18T04:14:25.0904466Z 2022-05-18T04:14:25.0904584Z Generating XML reports... 2022-05-18T04:14:25.0947673Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041419.xml 2022-05-18T04:14:26.2554206Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw0i_kxg9 2022-05-18T04:14:26.2555323Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw0i_kxg9/_remote_module_non_scriptable.py 2022-05-18T04:14:26.6657408Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:26.6672830Z 2022-05-18T04:14:26.6673092Z Running tests... 2022-05-18T04:14:26.6673542Z ---------------------------------------------------------------------- 2022-05-18T04:14:28.2595429Z test_device_maps_missing_config_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:28.2991122Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19151 2022-05-18T04:14:28.3097078Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19152 2022-05-18T04:14:28.3205067Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 19153 2022-05-18T04:14:28.3314802Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 19154 2022-05-18T04:14:29.2039479Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpil8m37xv 2022-05-18T04:14:29.2040651Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpil8m37xv/_remote_module_non_scriptable.py 2022-05-18T04:14:29.2042690Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcy9ednq_ 2022-05-18T04:14:29.2045807Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcy9ednq_/_remote_module_non_scriptable.py 2022-05-18T04:14:29.2158514Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl8ivm45j 2022-05-18T04:14:29.2161309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl8ivm45j/_remote_module_non_scriptable.py 2022-05-18T04:14:29.2185764Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9l4hfbet 2022-05-18T04:14:29.2188042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9l4hfbet/_remote_module_non_scriptable.py 2022-05-18T04:14:29.6073878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:29.6103435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:29.6196690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:29.6274371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:31.9419374Z ok (5.274s) 2022-05-18T04:14:31.9419608Z 2022-05-18T04:14:31.9420018Z ---------------------------------------------------------------------- 2022-05-18T04:14:31.9420348Z Ran 1 test in 5.275s 2022-05-18T04:14:31.9420548Z 2022-05-18T04:14:31.9420645Z OK 2022-05-18T04:14:31.9420791Z 2022-05-18T04:14:31.9420925Z Generating XML reports... 2022-05-18T04:14:31.9465352Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041426.xml 2022-05-18T04:14:33.1181710Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5yi102xh 2022-05-18T04:14:33.1182831Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5yi102xh/_remote_module_non_scriptable.py 2022-05-18T04:14:33.5288143Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:33.5302740Z 2022-05-18T04:14:33.5302956Z Running tests... 2022-05-18T04:14:33.5303409Z ---------------------------------------------------------------------- 2022-05-18T04:14:35.1131784Z test_device_maps_missing_config_not_timeout (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:35.1525510Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19494 2022-05-18T04:14:35.1633782Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19495 2022-05-18T04:14:35.1742393Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 19496 2022-05-18T04:14:35.1851149Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 19497 2022-05-18T04:14:36.1360669Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_6zgwkln 2022-05-18T04:14:36.1361620Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_6zgwkln/_remote_module_non_scriptable.py 2022-05-18T04:14:36.1405301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvn74xdtv 2022-05-18T04:14:36.1407919Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvn74xdtv/_remote_module_non_scriptable.py 2022-05-18T04:14:36.1814805Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw5v3ll_1 2022-05-18T04:14:36.1817426Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw5v3ll_1/_remote_module_non_scriptable.py 2022-05-18T04:14:36.2194178Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgc4jobgu 2022-05-18T04:14:36.2196880Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgc4jobgu/_remote_module_non_scriptable.py 2022-05-18T04:14:36.5393325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:36.5415215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:36.5908686Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:36.6227566Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:38.8959253Z ok (5.365s) 2022-05-18T04:14:38.8959657Z 2022-05-18T04:14:38.8960503Z ---------------------------------------------------------------------- 2022-05-18T04:14:38.8960896Z Ran 1 test in 5.366s 2022-05-18T04:14:38.8961063Z 2022-05-18T04:14:38.8961159Z OK 2022-05-18T04:14:38.8961296Z 2022-05-18T04:14:38.8961429Z Generating XML reports... 2022-05-18T04:14:38.9003162Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041433.xml 2022-05-18T04:14:40.0520564Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmhe8huxy 2022-05-18T04:14:40.0521557Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmhe8huxy/_remote_module_non_scriptable.py 2022-05-18T04:14:40.4528646Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:40.4542838Z 2022-05-18T04:14:40.4542979Z Running tests... 2022-05-18T04:14:40.4543672Z ---------------------------------------------------------------------- 2022-05-18T04:14:41.9949600Z test_device_maps_missing_config_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:42.0335600Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19837 2022-05-18T04:14:42.0443838Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19838 2022-05-18T04:14:42.0552548Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 19839 2022-05-18T04:14:42.0660510Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 19840 2022-05-18T04:14:42.9320160Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz5eyjb4e 2022-05-18T04:14:42.9321027Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz5eyjb4e/_remote_module_non_scriptable.py 2022-05-18T04:14:42.9531659Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkvv4o5gv 2022-05-18T04:14:42.9534407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkvv4o5gv/_remote_module_non_scriptable.py 2022-05-18T04:14:42.9818820Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiwbhmkec 2022-05-18T04:14:42.9822686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiwbhmkec/_remote_module_non_scriptable.py 2022-05-18T04:14:42.9936118Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjs89r9mx 2022-05-18T04:14:42.9938890Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjs89r9mx/_remote_module_non_scriptable.py 2022-05-18T04:14:43.3314437Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:43.3658216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:43.3805959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:43.4007359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:45.5765844Z ok (5.122s) 2022-05-18T04:14:45.5766067Z 2022-05-18T04:14:45.5766477Z ---------------------------------------------------------------------- 2022-05-18T04:14:45.5766809Z Ran 1 test in 5.122s 2022-05-18T04:14:45.5766977Z 2022-05-18T04:14:45.5767071Z OK 2022-05-18T04:14:45.5767227Z 2022-05-18T04:14:45.5767364Z Generating XML reports... 2022-05-18T04:14:45.5811377Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041440.xml 2022-05-18T04:14:46.7544549Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp75mo317f 2022-05-18T04:14:46.7546370Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp75mo317f/_remote_module_non_scriptable.py 2022-05-18T04:14:47.1632281Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:47.1647242Z 2022-05-18T04:14:47.1647511Z Running tests... 2022-05-18T04:14:47.1647953Z ---------------------------------------------------------------------- 2022-05-18T04:14:48.7443125Z test_device_maps_missing_config_remote_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:48.7836429Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20180 2022-05-18T04:14:48.7944746Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20181 2022-05-18T04:14:48.8053488Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 20182 2022-05-18T04:14:48.8162589Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 20183 2022-05-18T04:14:49.7588230Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5fm4vh1c 2022-05-18T04:14:49.7588823Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5fm4vh1c/_remote_module_non_scriptable.py 2022-05-18T04:14:49.7599623Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfgctqfva 2022-05-18T04:14:49.7600170Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptmz6aomu 2022-05-18T04:14:49.7602898Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfgctqfva/_remote_module_non_scriptable.py 2022-05-18T04:14:49.7603683Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptmz6aomu/_remote_module_non_scriptable.py 2022-05-18T04:14:49.7713165Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0dypkouk 2022-05-18T04:14:49.7715756Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0dypkouk/_remote_module_non_scriptable.py 2022-05-18T04:14:50.1624720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:50.1660299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:50.1663757Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:50.1722335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:52.4268606Z ok (5.262s) 2022-05-18T04:14:52.4268865Z 2022-05-18T04:14:52.4269300Z ---------------------------------------------------------------------- 2022-05-18T04:14:52.4269649Z Ran 1 test in 5.262s 2022-05-18T04:14:52.4269816Z 2022-05-18T04:14:52.4269913Z OK 2022-05-18T04:14:52.4270030Z 2022-05-18T04:14:52.4270168Z Generating XML reports... 2022-05-18T04:14:52.4312870Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041447.xml 2022-05-18T04:14:53.6000097Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7vi2k_x3 2022-05-18T04:14:53.6001279Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7vi2k_x3/_remote_module_non_scriptable.py 2022-05-18T04:14:54.0150130Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:14:54.0164574Z 2022-05-18T04:14:54.0165023Z Running tests... 2022-05-18T04:14:54.0166003Z ---------------------------------------------------------------------- 2022-05-18T04:14:55.6067693Z test_device_maps_missing_config_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:14:55.6459852Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20523 2022-05-18T04:14:55.6567656Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20524 2022-05-18T04:14:55.6677839Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 20525 2022-05-18T04:14:55.6787577Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 20526 2022-05-18T04:14:56.5393255Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvfz6wj76 2022-05-18T04:14:56.5393861Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvfz6wj76/_remote_module_non_scriptable.py 2022-05-18T04:14:56.5486663Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzaar7kmv 2022-05-18T04:14:56.5489715Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzaar7kmv/_remote_module_non_scriptable.py 2022-05-18T04:14:56.6103936Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq66w4bwq 2022-05-18T04:14:56.6106014Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq66w4bwq/_remote_module_non_scriptable.py 2022-05-18T04:14:56.6471332Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv4nmsuva 2022-05-18T04:14:56.6473969Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv4nmsuva/_remote_module_non_scriptable.py 2022-05-18T04:14:56.9389105Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:14:56.9452891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:14:57.0180738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:14:57.0583133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:14:59.1889976Z ok (5.172s) 2022-05-18T04:14:59.1891967Z 2022-05-18T04:14:59.1892626Z ---------------------------------------------------------------------- 2022-05-18T04:14:59.1892983Z Ran 1 test in 5.172s 2022-05-18T04:14:59.1893133Z 2022-05-18T04:14:59.1893227Z OK 2022-05-18T04:14:59.1893361Z 2022-05-18T04:14:59.1893498Z Generating XML reports... 2022-05-18T04:14:59.1934131Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041454.xml 2022-05-18T04:15:00.3689428Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj5xiurfl 2022-05-18T04:15:00.3690414Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj5xiurfl/_remote_module_non_scriptable.py 2022-05-18T04:15:00.7819499Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:00.7834766Z 2022-05-18T04:15:00.7835132Z Running tests... 2022-05-18T04:15:00.7835595Z ---------------------------------------------------------------------- 2022-05-18T04:15:02.3695620Z test_device_maps_missing_config_response_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:02.4097637Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20866 2022-05-18T04:15:02.4209104Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20867 2022-05-18T04:15:02.4322565Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 20868 2022-05-18T04:15:02.4433256Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 20869 2022-05-18T04:15:03.3871818Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj7_7nztn 2022-05-18T04:15:03.3872876Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj7_7nztn/_remote_module_non_scriptable.py 2022-05-18T04:15:03.3916565Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg6dll4xy 2022-05-18T04:15:03.3919273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg6dll4xy/_remote_module_non_scriptable.py 2022-05-18T04:15:03.3929972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgadhtqu9 2022-05-18T04:15:03.3932886Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgadhtqu9/_remote_module_non_scriptable.py 2022-05-18T04:15:03.4025056Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmpn9u7um 2022-05-18T04:15:03.4028026Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmpn9u7um/_remote_module_non_scriptable.py 2022-05-18T04:15:03.7974473Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:03.8001207Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:03.8051028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:03.8176791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:06.1548968Z ok (5.371s) 2022-05-18T04:15:06.1549197Z 2022-05-18T04:15:06.1549601Z ---------------------------------------------------------------------- 2022-05-18T04:15:06.1549928Z Ran 1 test in 5.371s 2022-05-18T04:15:06.1550102Z 2022-05-18T04:15:06.1550206Z OK 2022-05-18T04:15:06.1550368Z 2022-05-18T04:15:06.1550504Z Generating XML reports... 2022-05-18T04:15:06.1593880Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041500.xml 2022-05-18T04:15:07.3339266Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5kn_633o 2022-05-18T04:15:07.3340659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5kn_633o/_remote_module_non_scriptable.py 2022-05-18T04:15:07.7455121Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:07.7470394Z 2022-05-18T04:15:07.7470557Z Running tests... 2022-05-18T04:15:07.7471008Z ---------------------------------------------------------------------- 2022-05-18T04:15:09.3296929Z test_device_maps_multi_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:09.3682038Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21209 2022-05-18T04:15:09.3790724Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21210 2022-05-18T04:15:09.3898897Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 21211 2022-05-18T04:15:09.4009194Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 21212 2022-05-18T04:15:10.3650590Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpam_zu5ot 2022-05-18T04:15:10.3651445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpam_zu5ot/_remote_module_non_scriptable.py 2022-05-18T04:15:10.3668714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqznqq7lw 2022-05-18T04:15:10.3671667Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqznqq7lw/_remote_module_non_scriptable.py 2022-05-18T04:15:10.3857089Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppah1zmub 2022-05-18T04:15:10.3859698Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppah1zmub/_remote_module_non_scriptable.py 2022-05-18T04:15:10.4117611Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdp0nv0c3 2022-05-18T04:15:10.4119922Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdp0nv0c3/_remote_module_non_scriptable.py 2022-05-18T04:15:10.7692932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:10.7767894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:10.7896395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:10.8234683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:16.4191396Z ok (8.672s) 2022-05-18T04:15:16.4191733Z 2022-05-18T04:15:16.4192520Z ---------------------------------------------------------------------- 2022-05-18T04:15:16.4193103Z Ran 1 test in 8.672s 2022-05-18T04:15:16.4193274Z 2022-05-18T04:15:16.4194132Z OK 2022-05-18T04:15:16.4194544Z 2022-05-18T04:15:16.4194821Z Generating XML reports... 2022-05-18T04:15:16.4235744Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041507.xml 2022-05-18T04:15:17.5845359Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaz14lsua 2022-05-18T04:15:17.5846478Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaz14lsua/_remote_module_non_scriptable.py 2022-05-18T04:15:17.9963626Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:17.9978178Z 2022-05-18T04:15:17.9978471Z Running tests... 2022-05-18T04:15:17.9978918Z ---------------------------------------------------------------------- 2022-05-18T04:15:19.5729250Z test_device_maps_multi_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:19.6122217Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21564 2022-05-18T04:15:19.6229916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21565 2022-05-18T04:15:19.6338969Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 21566 2022-05-18T04:15:19.6449735Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 21567 2022-05-18T04:15:20.5306859Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1m2x2evs 2022-05-18T04:15:20.5308375Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1m2x2evs/_remote_module_non_scriptable.py 2022-05-18T04:15:20.5781430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp970smiha 2022-05-18T04:15:20.5784042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp970smiha/_remote_module_non_scriptable.py 2022-05-18T04:15:20.5852215Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpct7izpk0 2022-05-18T04:15:20.5854705Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpct7izpk0/_remote_module_non_scriptable.py 2022-05-18T04:15:20.5973125Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn72g8jr3 2022-05-18T04:15:20.5975815Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn72g8jr3/_remote_module_non_scriptable.py 2022-05-18T04:15:20.9284077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:20.9823160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:20.9843395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:21.0071523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:26.5631544Z ok (8.565s) 2022-05-18T04:15:26.5632361Z 2022-05-18T04:15:26.5632772Z ---------------------------------------------------------------------- 2022-05-18T04:15:26.5633122Z Ran 1 test in 8.565s 2022-05-18T04:15:26.5633288Z 2022-05-18T04:15:26.5633384Z OK 2022-05-18T04:15:26.5633525Z 2022-05-18T04:15:26.5633640Z Generating XML reports... 2022-05-18T04:15:26.5676427Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041517.xml 2022-05-18T04:15:27.7247208Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_d5qbps6 2022-05-18T04:15:27.7248188Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_d5qbps6/_remote_module_non_scriptable.py 2022-05-18T04:15:28.1209583Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:28.1224527Z 2022-05-18T04:15:28.1224749Z Running tests... 2022-05-18T04:15:28.1225189Z ---------------------------------------------------------------------- 2022-05-18T04:15:29.6736088Z test_device_maps_one_to_many (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:29.7122143Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21911 2022-05-18T04:15:29.7231117Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21912 2022-05-18T04:15:29.7338282Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 21913 2022-05-18T04:15:29.7447806Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 21914 2022-05-18T04:15:30.6030269Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppkds5ywl 2022-05-18T04:15:30.6030928Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppkds5ywl/_remote_module_non_scriptable.py 2022-05-18T04:15:30.6371575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiesiw2iq 2022-05-18T04:15:30.6373778Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiesiw2iq/_remote_module_non_scriptable.py 2022-05-18T04:15:30.6406159Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0_sk7_i3 2022-05-18T04:15:30.6409207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0_sk7_i3/_remote_module_non_scriptable.py 2022-05-18T04:15:30.6926619Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfkkpn4_k 2022-05-18T04:15:30.6929168Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfkkpn4_k/_remote_module_non_scriptable.py 2022-05-18T04:15:31.0018511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:31.0455786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:31.0541921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:31.1093974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:31.3505443Z ok (3.228s) 2022-05-18T04:15:31.3505667Z 2022-05-18T04:15:31.3506083Z ---------------------------------------------------------------------- 2022-05-18T04:15:31.3506430Z Ran 1 test in 3.228s 2022-05-18T04:15:31.3506598Z 2022-05-18T04:15:31.3506674Z OK 2022-05-18T04:15:31.3506815Z 2022-05-18T04:15:31.3506949Z Generating XML reports... 2022-05-18T04:15:31.3550408Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041528.xml 2022-05-18T04:15:32.5301515Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmk9encgx 2022-05-18T04:15:32.5302425Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmk9encgx/_remote_module_non_scriptable.py 2022-05-18T04:15:32.9390975Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:32.9406048Z 2022-05-18T04:15:32.9406481Z Running tests... 2022-05-18T04:15:32.9406930Z ---------------------------------------------------------------------- 2022-05-18T04:15:34.5132941Z test_device_maps_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:34.5523845Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22082 2022-05-18T04:15:34.5632142Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22083 2022-05-18T04:15:34.5741694Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22084 2022-05-18T04:15:34.5852662Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22085 2022-05-18T04:15:35.5628887Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe7tq00q7 2022-05-18T04:15:35.5629486Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7xweqei1 2022-05-18T04:15:35.5630034Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe7tq00q7/_remote_module_non_scriptable.py 2022-05-18T04:15:35.5632514Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7xweqei1/_remote_module_non_scriptable.py 2022-05-18T04:15:35.5686347Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0j6ko09b 2022-05-18T04:15:35.5688953Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0j6ko09b/_remote_module_non_scriptable.py 2022-05-18T04:15:35.5787451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx_eh4lcc 2022-05-18T04:15:35.5790470Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx_eh4lcc/_remote_module_non_scriptable.py 2022-05-18T04:15:35.9671927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:35.9672438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:35.9736368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:35.9910777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:41.6034423Z ok (8.662s) 2022-05-18T04:15:41.6034773Z 2022-05-18T04:15:41.6035246Z ---------------------------------------------------------------------- 2022-05-18T04:15:41.6035590Z Ran 1 test in 8.663s 2022-05-18T04:15:41.6035758Z 2022-05-18T04:15:41.6035835Z OK 2022-05-18T04:15:41.6035971Z 2022-05-18T04:15:41.6036102Z Generating XML reports... 2022-05-18T04:15:41.6078360Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041532.xml 2022-05-18T04:15:42.7588595Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp3ohtof4 2022-05-18T04:15:42.7589817Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp3ohtof4/_remote_module_non_scriptable.py 2022-05-18T04:15:43.1603664Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:43.1618033Z 2022-05-18T04:15:43.1618274Z Running tests... 2022-05-18T04:15:43.1618875Z ---------------------------------------------------------------------- 2022-05-18T04:15:44.7116905Z test_device_maps_return_to_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:44.7508761Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22437 2022-05-18T04:15:44.7616531Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22438 2022-05-18T04:15:44.7726584Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22439 2022-05-18T04:15:44.7840063Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22440 2022-05-18T04:15:45.7327963Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd6xr7qsy 2022-05-18T04:15:45.7328622Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd6xr7qsy/_remote_module_non_scriptable.py 2022-05-18T04:15:45.7700379Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6c8h5eql 2022-05-18T04:15:45.7701265Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpay4jqwc7 2022-05-18T04:15:45.7702445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6c8h5eql/_remote_module_non_scriptable.py 2022-05-18T04:15:45.7703013Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpay4jqwc7/_remote_module_non_scriptable.py 2022-05-18T04:15:45.8024218Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1dqcx2vo 2022-05-18T04:15:45.8027067Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1dqcx2vo/_remote_module_non_scriptable.py 2022-05-18T04:15:46.1348230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:46.1724379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:46.1837048Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:46.2224015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:46.3895868Z skip: Need at least 4 CUDA devices (3.228s) 2022-05-18T04:15:46.3896360Z 2022-05-18T04:15:46.3896758Z ---------------------------------------------------------------------- 2022-05-18T04:15:46.3897310Z Ran 1 test in 3.228s 2022-05-18T04:15:46.3897478Z 2022-05-18T04:15:46.3897591Z OK (skipped=1) 2022-05-18T04:15:46.3897749Z 2022-05-18T04:15:46.3897887Z Generating XML reports... 2022-05-18T04:15:46.3942856Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041543.xml 2022-05-18T04:15:47.5537366Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjs67w0b8 2022-05-18T04:15:47.5538250Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjs67w0b8/_remote_module_non_scriptable.py 2022-05-18T04:15:47.9517533Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:47.9531364Z 2022-05-18T04:15:47.9531638Z Running tests... 2022-05-18T04:15:47.9532080Z ---------------------------------------------------------------------- 2022-05-18T04:15:49.5216062Z test_device_maps_return_to_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:49.5605805Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22608 2022-05-18T04:15:49.5714453Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22609 2022-05-18T04:15:49.5821539Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22610 2022-05-18T04:15:49.5931388Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22611 2022-05-18T04:15:50.4650140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5591rhbd 2022-05-18T04:15:50.4651344Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5591rhbd/_remote_module_non_scriptable.py 2022-05-18T04:15:50.4681683Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp54jpdu4e 2022-05-18T04:15:50.4684740Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp54jpdu4e/_remote_module_non_scriptable.py 2022-05-18T04:15:50.4930972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt4jhz1uf 2022-05-18T04:15:50.4933650Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt4jhz1uf/_remote_module_non_scriptable.py 2022-05-18T04:15:50.5353602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdgx_0mye 2022-05-18T04:15:50.5356268Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdgx_0mye/_remote_module_non_scriptable.py 2022-05-18T04:15:50.8703533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:50.8868965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:50.8918347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:50.9467270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:51.0987552Z skip: Need at least 4 CUDA devices (3.145s) 2022-05-18T04:15:51.0987951Z 2022-05-18T04:15:51.0988348Z ---------------------------------------------------------------------- 2022-05-18T04:15:51.0988709Z Ran 1 test in 3.146s 2022-05-18T04:15:51.0988856Z 2022-05-18T04:15:51.0988970Z OK (skipped=1) 2022-05-18T04:15:51.0989125Z 2022-05-18T04:15:51.0989252Z Generating XML reports... 2022-05-18T04:15:51.1032083Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041547.xml 2022-05-18T04:15:52.2632124Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8zhseiti 2022-05-18T04:15:52.2633417Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8zhseiti/_remote_module_non_scriptable.py 2022-05-18T04:15:52.6761267Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:52.6776296Z 2022-05-18T04:15:52.6776742Z Running tests... 2022-05-18T04:15:52.6777244Z ---------------------------------------------------------------------- 2022-05-18T04:15:54.2600253Z test_device_maps_wrong_worker_name (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:54.2993473Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22779 2022-05-18T04:15:54.3103004Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22780 2022-05-18T04:15:54.3213004Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22781 2022-05-18T04:15:54.3324593Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22782 2022-05-18T04:15:55.2361006Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjqmy_kyd 2022-05-18T04:15:55.2361624Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjqmy_kyd/_remote_module_non_scriptable.py 2022-05-18T04:15:55.2675083Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg1_d81kt 2022-05-18T04:15:55.2677937Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg1_d81kt/_remote_module_non_scriptable.py 2022-05-18T04:15:55.2924268Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7o43pn2q 2022-05-18T04:15:55.2927285Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7o43pn2q/_remote_module_non_scriptable.py 2022-05-18T04:15:55.3057817Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxqxn22c8 2022-05-18T04:15:55.3060749Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxqxn22c8/_remote_module_non_scriptable.py 2022-05-18T04:15:55.6356328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:15:55.6874361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:15:55.7049040Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:15:55.7168727Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:15:56.0384590Z ok (3.360s) 2022-05-18T04:15:56.0384841Z 2022-05-18T04:15:56.0385227Z ---------------------------------------------------------------------- 2022-05-18T04:15:56.0385572Z Ran 1 test in 3.361s 2022-05-18T04:15:56.0385739Z 2022-05-18T04:15:56.0385835Z OK 2022-05-18T04:15:56.0385972Z 2022-05-18T04:15:56.0386113Z Generating XML reports... 2022-05-18T04:15:56.0429101Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041552.xml 2022-05-18T04:15:57.2162127Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt8bceuio 2022-05-18T04:15:57.2164160Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt8bceuio/_remote_module_non_scriptable.py 2022-05-18T04:15:57.6319583Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:15:57.6334499Z 2022-05-18T04:15:57.6334869Z Running tests... 2022-05-18T04:15:57.6335349Z ---------------------------------------------------------------------- 2022-05-18T04:15:59.2102229Z test_device_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:15:59.2531465Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22962 2022-05-18T04:15:59.2640342Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22963 2022-05-18T04:15:59.2748938Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22964 2022-05-18T04:15:59.2860914Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22965 2022-05-18T04:16:00.1683771Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeaccwhak 2022-05-18T04:16:00.1685348Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeaccwhak/_remote_module_non_scriptable.py 2022-05-18T04:16:00.1705900Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb5xe5g95 2022-05-18T04:16:00.1709254Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb5xe5g95/_remote_module_non_scriptable.py 2022-05-18T04:16:00.2121351Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprr013emz 2022-05-18T04:16:00.2123904Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprr013emz/_remote_module_non_scriptable.py 2022-05-18T04:16:00.2305157Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiehyw07j 2022-05-18T04:16:00.2307845Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiehyw07j/_remote_module_non_scriptable.py 2022-05-18T04:16:00.5659153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:00.5858965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:00.6140853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:00.6309674Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:03.5484401Z On WorkerInfo(id=1, name=worker1): 2022-05-18T04:16:03.5498616Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f71438ad1bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f71438a8b8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f714ee47bfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f714ee4a03f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f714ee4b807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f714f0194cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2ca0646 (0x7f71469de646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #7: + 0x2ca0766 (0x7f71469de766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f714f8e1f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2bbc355 (0x7f7150c7f355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2bbcae9 (0x7f7150c7fae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f714f90d5e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c3427 (0x7f7159918427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c3766 (0x7f7159918766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #14: + 0x1bfb9c (0x55a83e9c4b9c in /opt/conda/bin/python)\nframe #15: + 0x18e1bb (0x55a83e9931bb in /opt/conda/bin/python)\nframe #16: + 0x18e391 (0x55a83e993391 in /opt/conda/bin/python)\nframe #17: PyNumber_Add + 0x3d (0x55a83e942ffd in /opt/conda/bin/python)\nframe #18: _PyEval_EvalFrameDefault + 0xe1d (0x55a83e9db1fd in /opt/conda/bin/python)\nframe #19: _PyFunction_Vectorcall + 0x104 (0x55a83e99c284 in /opt/conda/bin/python)\nframe #20: _PyObject_Call + 0x1da (0x55a83e94aa7a in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x2610 (0x55a83e9dc9f0 in /opt/conda/bin/python)\nframe #22: _PyFunction_Vectorcall + 0x104 (0x55a83e99c284 in /opt/conda/bin/python)\nframe #23: _PyObject_Call + 0x1da (0x55a83e94aa7a in /opt/conda/bin/python)\nframe #24: + 0x94774a (0x7f7159f9c74a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7159f9aa3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f7159f9db25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7159fa1776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f7151daaabc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7159f9d915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #30: + 0x3ce0e43 (0x7f7151da3e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7151da4a38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f7151d9f0b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: + 0x3d10b42 (0x7f7151dd3b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f714389b5eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #35: + 0xc9039 (0x7f715cfdd039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #36: + 0x76db (0x7f71925e26db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #37: clone + 0x3f (0x7f719230b61f in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:16:03.5506675Z Traceback (most recent call last): 2022-05-18T04:16:03.5507250Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:16:03.5507708Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:16:03.5508365Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:16:03.5508785Z return x.cpu() + y.cuda() 2022-05-18T04:16:03.5509191Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:16:03.5509717Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:16:03.5510578Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f71438ad1bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5511553Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f71438a8b8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5512455Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f714ee47bfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5513264Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f714ee4a03f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5514267Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f714ee4b807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5515170Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f714f0194cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5515895Z frame #6: + 0x2ca0646 (0x7f71469de646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.5516546Z frame #7: + 0x2ca0766 (0x7f71469de766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.5517360Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f714f8e1f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5518088Z frame #9: + 0x2bbc355 (0x7f7150c7f355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5518733Z frame #10: + 0x2bbcae9 (0x7f7150c7fae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5519552Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f714f90d5e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5520366Z frame #12: + 0x2c3427 (0x7f7159918427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5521004Z frame #13: + 0x2c3766 (0x7f7159918766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5521444Z frame #14: + 0x1bfb9c (0x55a83e9c4b9c in /opt/conda/bin/python) 2022-05-18T04:16:03.5521856Z frame #15: + 0x18e1bb (0x55a83e9931bb in /opt/conda/bin/python) 2022-05-18T04:16:03.5522258Z frame #16: + 0x18e391 (0x55a83e993391 in /opt/conda/bin/python) 2022-05-18T04:16:03.5522629Z frame #17: PyNumber_Add + 0x3d (0x55a83e942ffd in /opt/conda/bin/python) 2022-05-18T04:16:03.5523050Z frame #18: _PyEval_EvalFrameDefault + 0xe1d (0x55a83e9db1fd in /opt/conda/bin/python) 2022-05-18T04:16:03.5523481Z frame #19: _PyFunction_Vectorcall + 0x104 (0x55a83e99c284 in /opt/conda/bin/python) 2022-05-18T04:16:03.5523883Z frame #20: _PyObject_Call + 0x1da (0x55a83e94aa7a in /opt/conda/bin/python) 2022-05-18T04:16:03.5524282Z frame #21: _PyEval_EvalFrameDefault + 0x2610 (0x55a83e9dc9f0 in /opt/conda/bin/python) 2022-05-18T04:16:03.5524706Z frame #22: _PyFunction_Vectorcall + 0x104 (0x55a83e99c284 in /opt/conda/bin/python) 2022-05-18T04:16:03.5525105Z frame #23: _PyObject_Call + 0x1da (0x55a83e94aa7a in /opt/conda/bin/python) 2022-05-18T04:16:03.5525690Z frame #24: + 0x94774a (0x7f7159f9c74a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5526482Z frame #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7159f9aa3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5527488Z frame #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f7159f9db25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5528605Z frame #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7159fa1776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5529877Z frame #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f7151daaabc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5531165Z frame #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7159f9d915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5532043Z frame #30: + 0x3ce0e43 (0x7f7151da3e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5532976Z frame #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7151da4a38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5534038Z frame #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f7151d9f0b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5534831Z frame #33: + 0x3d10b42 (0x7f7151dd3b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5535516Z frame #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f714389b5eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5536076Z frame #35: + 0xc9039 (0x7f715cfdd039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:16:03.5536635Z frame #36: + 0x76db (0x7f71925e26db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:16:03.5537135Z frame #37: clone + 0x3f (0x7f719230b61f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:16:03.5537362Z 2022-05-18T04:16:03.5537382Z 2022-05-18T04:16:03.5715231Z On WorkerInfo(id=0, name=worker0): 2022-05-18T04:16:03.5729078Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fb726f431bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7fb726f3eb8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7fb7324ddbfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7fb7324e003f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7fb7324e1807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7fb7326af4cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2ca0646 (0x7fb72a074646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #7: + 0x2ca0766 (0x7fb72a074766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7fb732f77f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2bbc355 (0x7fb734315355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2bbcae9 (0x7fb734315ae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7fb732fa35e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c3427 (0x7fb73cfae427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c3766 (0x7fb73cfae766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #14: + 0x1bfb9c (0x557e04df9b9c in /opt/conda/bin/python)\nframe #15: + 0x18e1bb (0x557e04dc81bb in /opt/conda/bin/python)\nframe #16: + 0x18e391 (0x557e04dc8391 in /opt/conda/bin/python)\nframe #17: PyNumber_Add + 0x3d (0x557e04d77ffd in /opt/conda/bin/python)\nframe #18: _PyEval_EvalFrameDefault + 0xe1d (0x557e04e101fd in /opt/conda/bin/python)\nframe #19: _PyFunction_Vectorcall + 0x104 (0x557e04dd1284 in /opt/conda/bin/python)\nframe #20: _PyObject_Call + 0x1da (0x557e04d7fa7a in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x2610 (0x557e04e119f0 in /opt/conda/bin/python)\nframe #22: _PyFunction_Vectorcall + 0x104 (0x557e04dd1284 in /opt/conda/bin/python)\nframe #23: _PyObject_Call + 0x1da (0x557e04d7fa7a in /opt/conda/bin/python)\nframe #24: + 0x94774a (0x7fb73d63274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fb73d630a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7fb73d633b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7fb73d637776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7fb735440abc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fb73d633915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #30: + 0x3ce0e43 (0x7fb735439e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fb73543aa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fb7354350b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: + 0x3d10b42 (0x7fb735469b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fb726f315eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #35: + 0xc9039 (0x7fb740673039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #36: + 0x76db (0x7fb775c786db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #37: clone + 0x3f (0x7fb7759a161f in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:16:03.5736500Z Traceback (most recent call last): 2022-05-18T04:16:03.5737116Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:16:03.5737586Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:16:03.5738210Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:16:03.5738610Z return x.cpu() + y.cuda() 2022-05-18T04:16:03.5739015Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:16:03.5739584Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:16:03.5740439Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fb726f431bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5741422Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7fb726f3eb8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5742312Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7fb7324ddbfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5743210Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7fb7324e003f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5744782Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7fb7324e1807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5745685Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7fb7326af4cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5746392Z frame #6: + 0x2ca0646 (0x7fb72a074646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.5747040Z frame #7: + 0x2ca0766 (0x7fb72a074766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.5747860Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7fb732f77f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5748607Z frame #9: + 0x2bbc355 (0x7fb734315355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5749254Z frame #10: + 0x2bbcae9 (0x7fb734315ae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5750016Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7fb732fa35e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5750739Z frame #12: + 0x2c3427 (0x7fb73cfae427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5751402Z frame #13: + 0x2c3766 (0x7fb73cfae766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5751880Z frame #14: + 0x1bfb9c (0x557e04df9b9c in /opt/conda/bin/python) 2022-05-18T04:16:03.5752277Z frame #15: + 0x18e1bb (0x557e04dc81bb in /opt/conda/bin/python) 2022-05-18T04:16:03.5752677Z frame #16: + 0x18e391 (0x557e04dc8391 in /opt/conda/bin/python) 2022-05-18T04:16:03.5753075Z frame #17: PyNumber_Add + 0x3d (0x557e04d77ffd in /opt/conda/bin/python) 2022-05-18T04:16:03.5753476Z frame #18: _PyEval_EvalFrameDefault + 0xe1d (0x557e04e101fd in /opt/conda/bin/python) 2022-05-18T04:16:03.5754005Z frame #19: _PyFunction_Vectorcall + 0x104 (0x557e04dd1284 in /opt/conda/bin/python) 2022-05-18T04:16:03.5754434Z frame #20: _PyObject_Call + 0x1da (0x557e04d7fa7a in /opt/conda/bin/python) 2022-05-18T04:16:03.5754849Z frame #21: _PyEval_EvalFrameDefault + 0x2610 (0x557e04e119f0 in /opt/conda/bin/python) 2022-05-18T04:16:03.5755248Z frame #22: _PyFunction_Vectorcall + 0x104 (0x557e04dd1284 in /opt/conda/bin/python) 2022-05-18T04:16:03.5755654Z frame #23: _PyObject_Call + 0x1da (0x557e04d7fa7a in /opt/conda/bin/python) 2022-05-18T04:16:03.5756263Z frame #24: + 0x94774a (0x7fb73d63274a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5757031Z frame #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fb73d630a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5758048Z frame #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7fb73d633b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5759157Z frame #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7fb73d637776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5760457Z frame #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7fb735440abc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5761727Z frame #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fb73d633915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5762607Z frame #30: + 0x3ce0e43 (0x7fb735439e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5763546Z frame #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fb73543aa38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5764599Z frame #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fb7354350b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5765386Z frame #33: + 0x3d10b42 (0x7fb735469b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5766070Z frame #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fb726f315eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5766573Z frame #35: + 0xc9039 (0x7fb740673039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:16:03.5767102Z frame #36: + 0x76db (0x7fb775c786db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:16:03.5767610Z frame #37: clone + 0x3f (0x7fb7759a161f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:16:03.5767840Z 2022-05-18T04:16:03.5767858Z 2022-05-18T04:16:03.5844921Z On WorkerInfo(id=3, name=worker3): 2022-05-18T04:16:03.5876606Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fb9ff3841bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7fb9ff37fb8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7fba0a91ebfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7fba0a92103f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7fba0a922807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7fba0aaf04cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2ca0646 (0x7fba024b5646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #7: + 0x2ca0766 (0x7fba024b5766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7fba0b3b8f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2bbc355 (0x7fba0c756355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2bbcae9 (0x7fba0c756ae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7fba0b3e45e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c3427 (0x7fba153ef427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c3766 (0x7fba153ef766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #14: + 0x1bfb9c (0x55816de18b9c in /opt/conda/bin/python)\nframe #15: + 0x18e1bb (0x55816dde71bb in /opt/conda/bin/python)\nframe #16: + 0x18e391 (0x55816dde7391 in /opt/conda/bin/python)\nframe #17: PyNumber_Add + 0x3d (0x55816dd96ffd in /opt/conda/bin/python)\nframe #18: _PyEval_EvalFrameDefault + 0xe1d (0x55816de2f1fd in /opt/conda/bin/python)\nframe #19: _PyFunction_Vectorcall + 0x104 (0x55816ddf0284 in /opt/conda/bin/python)\nframe #20: _PyObject_Call + 0x1da (0x55816dd9ea7a in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x2610 (0x55816de309f0 in /opt/conda/bin/python)\nframe #22: _PyFunction_Vectorcall + 0x104 (0x55816ddf0284 in /opt/conda/bin/python)\nframe #23: _PyObject_Call + 0x1da (0x55816dd9ea7a in /opt/conda/bin/python)\nframe #24: + 0x94774a (0x7fba15a7374a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fba15a71a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7fba15a74b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7fba15a78776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7fba0d881abc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fba15a74915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #30: + 0x3ce0e43 (0x7fba0d87ae43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fba0d87ba38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fba0d8760b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: + 0x3d10b42 (0x7fba0d8aab42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fb9ff3725eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #35: + 0xc9039 (0x7fba18ab4039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #36: + 0x76db (0x7fba4e0b96db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #37: clone + 0x3f (0x7fba4dde261f in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:16:03.5894416Z Traceback (most recent call last): 2022-05-18T04:16:03.5895643Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:16:03.5896688Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:16:03.5898099Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:16:03.5899023Z return x.cpu() + y.cuda() 2022-05-18T04:16:03.5899928Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:16:03.5901178Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:16:03.5903143Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fb9ff3841bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5905862Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7fb9ff37fb8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5907929Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7fba0a91ebfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5909790Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7fba0a92103f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5911874Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7fba0a922807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5913961Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7fba0aaf04cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5915812Z frame #6: + 0x2ca0646 (0x7fba024b5646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.5917292Z frame #7: + 0x2ca0766 (0x7fba024b5766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.5919214Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7fba0b3b8f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5920921Z frame #9: + 0x2bbc355 (0x7fba0c756355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5922421Z frame #10: + 0x2bbcae9 (0x7fba0c756ae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5924160Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7fba0b3e45e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5925808Z frame #12: + 0x2c3427 (0x7fba153ef427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5927294Z frame #13: + 0x2c3766 (0x7fba153ef766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5928356Z frame #14: + 0x1bfb9c (0x55816de18b9c in /opt/conda/bin/python) 2022-05-18T04:16:03.5929244Z frame #15: + 0x18e1bb (0x55816dde71bb in /opt/conda/bin/python) 2022-05-18T04:16:03.5930312Z frame #16: + 0x18e391 (0x55816dde7391 in /opt/conda/bin/python) 2022-05-18T04:16:03.5931197Z frame #17: PyNumber_Add + 0x3d (0x55816dd96ffd in /opt/conda/bin/python) 2022-05-18T04:16:03.5932140Z frame #18: _PyEval_EvalFrameDefault + 0xe1d (0x55816de2f1fd in /opt/conda/bin/python) 2022-05-18T04:16:03.5933070Z frame #19: _PyFunction_Vectorcall + 0x104 (0x55816ddf0284 in /opt/conda/bin/python) 2022-05-18T04:16:03.5933991Z frame #20: _PyObject_Call + 0x1da (0x55816dd9ea7a in /opt/conda/bin/python) 2022-05-18T04:16:03.5934928Z frame #21: _PyEval_EvalFrameDefault + 0x2610 (0x55816de309f0 in /opt/conda/bin/python) 2022-05-18T04:16:03.5935850Z frame #22: _PyFunction_Vectorcall + 0x104 (0x55816ddf0284 in /opt/conda/bin/python) 2022-05-18T04:16:03.5936765Z frame #23: _PyObject_Call + 0x1da (0x55816dd9ea7a in /opt/conda/bin/python) 2022-05-18T04:16:03.5938143Z frame #24: + 0x94774a (0x7fba15a7374a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5939973Z frame #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fba15a71a3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5942302Z frame #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7fba15a74b25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5945178Z frame #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7fba15a78776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5948036Z frame #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7fba0d881abc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5951044Z frame #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fba15a74915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.5953245Z frame #30: + 0x3ce0e43 (0x7fba0d87ae43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5955444Z frame #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fba0d87ba38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5957932Z frame #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fba0d8760b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5959739Z frame #33: + 0x3d10b42 (0x7fba0d8aab42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.5961309Z frame #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fb9ff3725eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.5962472Z frame #35: + 0xc9039 (0x7fba18ab4039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:16:03.5963711Z frame #36: + 0x76db (0x7fba4e0b96db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:16:03.5964821Z frame #37: clone + 0x3f (0x7fba4dde261f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:16:03.5965343Z 2022-05-18T04:16:03.5965506Z 2022-05-18T04:16:03.5993149Z On WorkerInfo(id=2, name=worker2): 2022-05-18T04:16:03.6021711Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f06728be1bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f06728b9b8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f067de58bfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f067de5b03f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f067de5c807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f067e02a4cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2ca0646 (0x7f06759ef646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #7: + 0x2ca0766 (0x7f06759ef766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f067e8f2f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2bbc355 (0x7f067fc90355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2bbcae9 (0x7f067fc90ae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f067e91e5e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c3427 (0x7f0688929427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c3766 (0x7f0688929766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #14: + 0x1bfb9c (0x564c1605db9c in /opt/conda/bin/python)\nframe #15: + 0x18e1bb (0x564c1602c1bb in /opt/conda/bin/python)\nframe #16: + 0x18e391 (0x564c1602c391 in /opt/conda/bin/python)\nframe #17: PyNumber_Add + 0x3d (0x564c15fdbffd in /opt/conda/bin/python)\nframe #18: _PyEval_EvalFrameDefault + 0xe1d (0x564c160741fd in /opt/conda/bin/python)\nframe #19: _PyFunction_Vectorcall + 0x104 (0x564c16035284 in /opt/conda/bin/python)\nframe #20: _PyObject_Call + 0x1da (0x564c15fe3a7a in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x2610 (0x564c160759f0 in /opt/conda/bin/python)\nframe #22: _PyFunction_Vectorcall + 0x104 (0x564c16035284 in /opt/conda/bin/python)\nframe #23: _PyObject_Call + 0x1da (0x564c15fe3a7a in /opt/conda/bin/python)\nframe #24: + 0x94774a (0x7f0688fad74a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0688faba3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0688faeb25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f0688fb2776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f0680dbbabc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0688fae915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so)\nframe #30: + 0x3ce0e43 (0x7f0680db4e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0680db5a38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0680db00b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: + 0x3d10b42 (0x7f0680de4b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)\nframe #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f06728ac5eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so)\nframe #35: + 0xc9039 (0x7f068bfee039 in /opt/conda/bin/../lib/libstdc++.so.6)\nframe #36: + 0x76db (0x7f06c15f36db in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #37: clone + 0x3f (0x7f06c131c61f in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:16:03.6039698Z Traceback (most recent call last): 2022-05-18T04:16:03.6040928Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:16:03.6041968Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:16:03.6043392Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:16:03.6044339Z return x.cpu() + y.cuda() 2022-05-18T04:16:03.6045377Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:16:03.6046651Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:16:03.6048684Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f06728be1bb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.6050967Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f06728b9b8e in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.6053058Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f067de58bfb in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6054904Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f067de5b03f in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6056983Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f067de5c807 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6059248Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f067e02a4cf in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6060906Z frame #6: + 0x2ca0646 (0x7f06759ef646 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.6062382Z frame #7: + 0x2ca0766 (0x7f06759ef766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so) 2022-05-18T04:16:03.6064566Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f067e8f2f78 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6066303Z frame #9: + 0x2bbc355 (0x7f067fc90355 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6067777Z frame #10: + 0x2bbcae9 (0x7f067fc90ae9 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6069548Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f067e91e5e3 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6071151Z frame #12: + 0x2c3427 (0x7f0688929427 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6072628Z frame #13: + 0x2c3766 (0x7f0688929766 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6073672Z frame #14: + 0x1bfb9c (0x564c1605db9c in /opt/conda/bin/python) 2022-05-18T04:16:03.6074596Z frame #15: + 0x18e1bb (0x564c1602c1bb in /opt/conda/bin/python) 2022-05-18T04:16:03.6075471Z frame #16: + 0x18e391 (0x564c1602c391 in /opt/conda/bin/python) 2022-05-18T04:16:03.6076353Z frame #17: PyNumber_Add + 0x3d (0x564c15fdbffd in /opt/conda/bin/python) 2022-05-18T04:16:03.6077295Z frame #18: _PyEval_EvalFrameDefault + 0xe1d (0x564c160741fd in /opt/conda/bin/python) 2022-05-18T04:16:03.6078214Z frame #19: _PyFunction_Vectorcall + 0x104 (0x564c16035284 in /opt/conda/bin/python) 2022-05-18T04:16:03.6079122Z frame #20: _PyObject_Call + 0x1da (0x564c15fe3a7a in /opt/conda/bin/python) 2022-05-18T04:16:03.6080051Z frame #21: _PyEval_EvalFrameDefault + 0x2610 (0x564c160759f0 in /opt/conda/bin/python) 2022-05-18T04:16:03.6080997Z frame #22: _PyFunction_Vectorcall + 0x104 (0x564c16035284 in /opt/conda/bin/python) 2022-05-18T04:16:03.6082046Z frame #23: _PyObject_Call + 0x1da (0x564c15fe3a7a in /opt/conda/bin/python) 2022-05-18T04:16:03.6083754Z frame #24: + 0x94774a (0x7f0688fad74a in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6085570Z frame #25: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f0688faba3d in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6087935Z frame #26: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x85 (0x7f0688faeb25 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6090533Z frame #27: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f0688fb2776 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6093379Z frame #28: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f0680dbbabc in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6096562Z frame #29: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f0688fae915 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:16:03.6098636Z frame #30: + 0x3ce0e43 (0x7f0680db4e43 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6100818Z frame #31: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f0680db5a38 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6103305Z frame #32: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f0680db00b7 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6105411Z frame #33: + 0x3d10b42 (0x7f0680de4b42 in /opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:16:03.6106976Z frame #34: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f06728ac5eb in /opt/conda/lib/python3.9/site-packages/torch/lib/libc10.so) 2022-05-18T04:16:03.6108115Z frame #35: + 0xc9039 (0x7f068bfee039 in /opt/conda/bin/../lib/libstdc++.so.6) 2022-05-18T04:16:03.6109363Z frame #36: + 0x76db (0x7f06c15f36db in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:16:03.6110477Z frame #37: clone + 0x3f (0x7f06c131c61f in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:16:03.6110980Z 2022-05-18T04:16:03.6111010Z 2022-05-18T04:16:04.1002894Z ok (6.466s) 2022-05-18T04:16:04.1003112Z 2022-05-18T04:16:04.1003517Z ---------------------------------------------------------------------- 2022-05-18T04:16:04.1003859Z Ran 1 test in 6.467s 2022-05-18T04:16:04.1004046Z 2022-05-18T04:16:04.1004123Z OK 2022-05-18T04:16:04.1004261Z 2022-05-18T04:16:04.1004401Z Generating XML reports... 2022-05-18T04:16:04.1048615Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041557.xml 2022-05-18T04:16:05.2842195Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzahcobe_ 2022-05-18T04:16:05.2843412Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzahcobe_/_remote_module_non_scriptable.py 2022-05-18T04:16:05.6918604Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:05.6933699Z 2022-05-18T04:16:05.6933967Z Running tests... 2022-05-18T04:16:05.6934569Z ---------------------------------------------------------------------- 2022-05-18T04:16:07.2751137Z test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:07.3145858Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23305 2022-05-18T04:16:07.3253758Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23306 2022-05-18T04:16:07.3364160Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23307 2022-05-18T04:16:07.3473734Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23308 2022-05-18T04:16:08.2622515Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo5avz1l8 2022-05-18T04:16:08.2623895Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo5avz1l8/_remote_module_non_scriptable.py 2022-05-18T04:16:08.3281870Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp6x_c255 2022-05-18T04:16:08.3283932Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp6x_c255/_remote_module_non_scriptable.py 2022-05-18T04:16:08.3573437Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphstkmq6x 2022-05-18T04:16:08.3576077Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphstkmq6x/_remote_module_non_scriptable.py 2022-05-18T04:16:08.3598677Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplbd99y8m 2022-05-18T04:16:08.3601481Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplbd99y8m/_remote_module_non_scriptable.py 2022-05-18T04:16:08.6600576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:08.7427379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:08.7575160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:08.7652946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:09.0533580Z ok (3.360s) 2022-05-18T04:16:09.0533898Z 2022-05-18T04:16:09.0534667Z ---------------------------------------------------------------------- 2022-05-18T04:16:09.0535320Z Ran 1 test in 3.360s 2022-05-18T04:16:09.0535494Z 2022-05-18T04:16:09.0535569Z OK 2022-05-18T04:16:09.0535719Z 2022-05-18T04:16:09.0535855Z Generating XML reports... 2022-05-18T04:16:09.0578925Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041605.xml 2022-05-18T04:16:10.2311244Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp44qwijxj 2022-05-18T04:16:10.2312747Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp44qwijxj/_remote_module_non_scriptable.py 2022-05-18T04:16:10.6403413Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:10.6418196Z 2022-05-18T04:16:10.6418672Z Running tests... 2022-05-18T04:16:10.6419644Z ---------------------------------------------------------------------- 2022-05-18T04:16:12.2263857Z test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:12.2657331Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23488 2022-05-18T04:16:12.2766629Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23489 2022-05-18T04:16:12.2875731Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23490 2022-05-18T04:16:12.2985359Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23491 2022-05-18T04:16:13.2752463Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmj4ndlp4 2022-05-18T04:16:13.2753440Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmj4ndlp4/_remote_module_non_scriptable.py 2022-05-18T04:16:13.2761766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzw5yquwb 2022-05-18T04:16:13.2764383Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzw5yquwb/_remote_module_non_scriptable.py 2022-05-18T04:16:13.2872006Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi4ruz4d_ 2022-05-18T04:16:13.2874784Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi4ruz4d_/_remote_module_non_scriptable.py 2022-05-18T04:16:13.2880819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqe5dxi59 2022-05-18T04:16:13.2883920Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqe5dxi59/_remote_module_non_scriptable.py 2022-05-18T04:16:13.6744800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:13.6872481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:13.7012324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:13.7021313Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:14.0044305Z ok (3.362s) 2022-05-18T04:16:14.0044516Z 2022-05-18T04:16:14.0044916Z ---------------------------------------------------------------------- 2022-05-18T04:16:14.0045264Z Ran 1 test in 3.363s 2022-05-18T04:16:14.0045435Z 2022-05-18T04:16:14.0045531Z OK 2022-05-18T04:16:14.0045650Z 2022-05-18T04:16:14.0045782Z Generating XML reports... 2022-05-18T04:16:14.0090178Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041610.xml 2022-05-18T04:16:15.1840130Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn40al1v9 2022-05-18T04:16:15.1841486Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn40al1v9/_remote_module_non_scriptable.py 2022-05-18T04:16:15.5943422Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:15.5958482Z 2022-05-18T04:16:15.5958682Z Running tests... 2022-05-18T04:16:15.5959162Z ---------------------------------------------------------------------- 2022-05-18T04:16:17.1777230Z test_meta_multiple_tensors (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:17.2170018Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23671 2022-05-18T04:16:17.2277989Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23672 2022-05-18T04:16:17.2387065Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23673 2022-05-18T04:16:17.2495933Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23674 2022-05-18T04:16:18.1736449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7w4yxwf7 2022-05-18T04:16:18.1737534Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7w4yxwf7/_remote_module_non_scriptable.py 2022-05-18T04:16:18.2204004Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7_rns3tu 2022-05-18T04:16:18.2206011Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7_rns3tu/_remote_module_non_scriptable.py 2022-05-18T04:16:18.2398897Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg7z0p2_h 2022-05-18T04:16:18.2401790Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg7z0p2_h/_remote_module_non_scriptable.py 2022-05-18T04:16:18.2675271Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpot18pxxs 2022-05-18T04:16:18.2678012Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpot18pxxs/_remote_module_non_scriptable.py 2022-05-18T04:16:18.5712229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:18.6199901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:18.6546453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:18.6683081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:22.0629762Z ok (6.467s) 2022-05-18T04:16:22.0630153Z 2022-05-18T04:16:22.0630845Z ---------------------------------------------------------------------- 2022-05-18T04:16:22.0631450Z Ran 1 test in 6.467s 2022-05-18T04:16:22.0631742Z 2022-05-18T04:16:22.0631918Z OK 2022-05-18T04:16:22.0632173Z 2022-05-18T04:16:22.0632405Z Generating XML reports... 2022-05-18T04:16:22.0676060Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041615.xml 2022-05-18T04:16:23.2296987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl2_5glds 2022-05-18T04:16:23.2298261Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl2_5glds/_remote_module_non_scriptable.py 2022-05-18T04:16:23.6429631Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:23.6444930Z 2022-05-18T04:16:23.6445083Z Running tests... 2022-05-18T04:16:23.6445531Z ---------------------------------------------------------------------- 2022-05-18T04:16:25.2378619Z test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:25.2773600Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24014 2022-05-18T04:16:25.2882238Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24015 2022-05-18T04:16:25.2991580Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24016 2022-05-18T04:16:25.3100484Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24017 2022-05-18T04:16:26.2811726Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpowp9a4bl 2022-05-18T04:16:26.2812601Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpowp9a4bl/_remote_module_non_scriptable.py 2022-05-18T04:16:26.2820632Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm4_dqxgy 2022-05-18T04:16:26.2823316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm4_dqxgy/_remote_module_non_scriptable.py 2022-05-18T04:16:26.3100054Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsltlfh65 2022-05-18T04:16:26.3103000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsltlfh65/_remote_module_non_scriptable.py 2022-05-18T04:16:26.3120806Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv75w_w30 2022-05-18T04:16:26.3123712Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv75w_w30/_remote_module_non_scriptable.py 2022-05-18T04:16:26.6835926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:26.6836475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:26.7057706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:26.7244023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:31.4252933Z ok (7.780s) 2022-05-18T04:16:31.4253158Z 2022-05-18T04:16:31.4253551Z ---------------------------------------------------------------------- 2022-05-18T04:16:31.4253899Z Ran 1 test in 7.781s 2022-05-18T04:16:31.4254078Z 2022-05-18T04:16:31.4254175Z OK 2022-05-18T04:16:31.4254310Z 2022-05-18T04:16:31.4254444Z Generating XML reports... 2022-05-18T04:16:31.4299412Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041623.xml 2022-05-18T04:16:32.6065530Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpga17pviz 2022-05-18T04:16:32.6066604Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpga17pviz/_remote_module_non_scriptable.py 2022-05-18T04:16:33.0176128Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:33.0190973Z 2022-05-18T04:16:33.0191352Z Running tests... 2022-05-18T04:16:33.0191821Z ---------------------------------------------------------------------- 2022-05-18T04:16:34.6059863Z test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:34.6452063Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24228 2022-05-18T04:16:34.6560045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24229 2022-05-18T04:16:34.6668590Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24230 2022-05-18T04:16:34.6778507Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24231 2022-05-18T04:16:35.5910562Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp9ohp76s 2022-05-18T04:16:35.5911459Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp9ohp76s/_remote_module_non_scriptable.py 2022-05-18T04:16:35.6063428Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpggne_afc 2022-05-18T04:16:35.6066302Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpggne_afc/_remote_module_non_scriptable.py 2022-05-18T04:16:35.6156223Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4nls0c51 2022-05-18T04:16:35.6159075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4nls0c51/_remote_module_non_scriptable.py 2022-05-18T04:16:35.6191291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr0ju0ojp 2022-05-18T04:16:35.6194267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr0ju0ojp/_remote_module_non_scriptable.py 2022-05-18T04:16:35.9931924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:36.0188163Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:36.0207864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:36.0212875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:42.5968886Z ok (9.577s) 2022-05-18T04:16:42.5969109Z 2022-05-18T04:16:42.5969525Z ---------------------------------------------------------------------- 2022-05-18T04:16:42.5969875Z Ran 1 test in 9.578s 2022-05-18T04:16:42.5970021Z 2022-05-18T04:16:42.5970118Z OK 2022-05-18T04:16:42.5970274Z 2022-05-18T04:16:42.5970411Z Generating XML reports... 2022-05-18T04:16:42.6014426Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041633.xml 2022-05-18T04:16:43.7721448Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp47essb1m 2022-05-18T04:16:43.7722619Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp47essb1m/_remote_module_non_scriptable.py 2022-05-18T04:16:44.1719557Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:44.1733452Z 2022-05-18T04:16:44.1733595Z Running tests... 2022-05-18T04:16:44.1734245Z ---------------------------------------------------------------------- 2022-05-18T04:16:45.7181893Z test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:45.7566134Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24443 2022-05-18T04:16:45.7672465Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24444 2022-05-18T04:16:45.7780284Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24445 2022-05-18T04:16:45.7888783Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24446 2022-05-18T04:16:46.7351712Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp422t5jef 2022-05-18T04:16:46.7352320Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp422t5jef/_remote_module_non_scriptable.py 2022-05-18T04:16:46.7402563Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpynzeh10l 2022-05-18T04:16:46.7405131Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpynzeh10l/_remote_module_non_scriptable.py 2022-05-18T04:16:46.7561196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp68z6gpt3 2022-05-18T04:16:46.7563839Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp68z6gpt3/_remote_module_non_scriptable.py 2022-05-18T04:16:46.7809288Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu946rfhr 2022-05-18T04:16:46.7812458Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu946rfhr/_remote_module_non_scriptable.py 2022-05-18T04:16:47.1389398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:47.1414977Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:47.1614277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:47.1956286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:16:53.4072859Z ok (9.234s) 2022-05-18T04:16:53.4073138Z 2022-05-18T04:16:53.4073739Z ---------------------------------------------------------------------- 2022-05-18T04:16:53.4074107Z Ran 1 test in 9.234s 2022-05-18T04:16:53.4074275Z 2022-05-18T04:16:53.4074372Z OK 2022-05-18T04:16:53.4074507Z 2022-05-18T04:16:53.4074643Z Generating XML reports... 2022-05-18T04:16:53.4117825Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041644.xml 2022-05-18T04:16:54.5845335Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk__098z3 2022-05-18T04:16:54.5846563Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk__098z3/_remote_module_non_scriptable.py 2022-05-18T04:16:54.9945961Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:16:54.9961192Z 2022-05-18T04:16:54.9961318Z Running tests... 2022-05-18T04:16:54.9961771Z ---------------------------------------------------------------------- 2022-05-18T04:16:56.5690543Z test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:16:56.6077729Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24658 2022-05-18T04:16:56.6184165Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24659 2022-05-18T04:16:56.6294313Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24660 2022-05-18T04:16:56.6405224Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24661 2022-05-18T04:16:57.5012637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1qa_3zsb 2022-05-18T04:16:57.5013782Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1qa_3zsb/_remote_module_non_scriptable.py 2022-05-18T04:16:57.5056614Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpny0weytm 2022-05-18T04:16:57.5059728Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpny0weytm/_remote_module_non_scriptable.py 2022-05-18T04:16:57.5374866Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpebypt9s9 2022-05-18T04:16:57.5377505Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpebypt9s9/_remote_module_non_scriptable.py 2022-05-18T04:16:57.5812714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp42cl0tfc 2022-05-18T04:16:57.5815111Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp42cl0tfc/_remote_module_non_scriptable.py 2022-05-18T04:16:57.9039913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:16:57.9069257Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:16:57.9504185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:16:57.9878324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:17:02.7558375Z ok (7.759s) 2022-05-18T04:17:02.7558703Z 2022-05-18T04:17:02.7559218Z ---------------------------------------------------------------------- 2022-05-18T04:17:02.7559605Z Ran 1 test in 7.760s 2022-05-18T04:17:02.7559760Z 2022-05-18T04:17:02.7559853Z OK 2022-05-18T04:17:02.7559991Z 2022-05-18T04:17:02.7560126Z Generating XML reports... 2022-05-18T04:17:02.7603587Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041654.xml 2022-05-18T04:17:03.9334054Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv53stn5y 2022-05-18T04:17:03.9335498Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv53stn5y/_remote_module_non_scriptable.py 2022-05-18T04:17:04.3465389Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:17:04.3480724Z 2022-05-18T04:17:04.3480864Z Running tests... 2022-05-18T04:17:04.3481619Z ---------------------------------------------------------------------- 2022-05-18T04:17:05.9340922Z test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:17:05.9732939Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24872 2022-05-18T04:17:05.9840716Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24873 2022-05-18T04:17:05.9951328Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24874 2022-05-18T04:17:06.0062387Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24875 2022-05-18T04:17:06.9031125Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnqzs_jh2 2022-05-18T04:17:06.9032278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnqzs_jh2/_remote_module_non_scriptable.py 2022-05-18T04:17:06.9279400Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp22gsfvyb 2022-05-18T04:17:06.9281627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp22gsfvyb/_remote_module_non_scriptable.py 2022-05-18T04:17:06.9463007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7m_x4k_k 2022-05-18T04:17:06.9465248Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7m_x4k_k/_remote_module_non_scriptable.py 2022-05-18T04:17:06.9909588Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiy6he3ss 2022-05-18T04:17:06.9911776Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiy6he3ss/_remote_module_non_scriptable.py 2022-05-18T04:17:07.3006604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:17:07.3328887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:17:07.3522666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:17:07.3899183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:17:20.0383940Z ok (15.690s) 2022-05-18T04:17:20.0384181Z 2022-05-18T04:17:20.0384838Z ---------------------------------------------------------------------- 2022-05-18T04:17:20.0385178Z Ran 1 test in 15.690s 2022-05-18T04:17:20.0385344Z 2022-05-18T04:17:20.0385439Z OK 2022-05-18T04:17:20.0385951Z 2022-05-18T04:17:20.0386111Z Generating XML reports... 2022-05-18T04:17:20.0429131Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041704.xml 2022-05-18T04:17:21.2083329Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpen28i85y 2022-05-18T04:17:21.2084517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpen28i85y/_remote_module_non_scriptable.py 2022-05-18T04:17:21.6171756Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:17:21.6186236Z 2022-05-18T04:17:21.6186511Z Running tests... 2022-05-18T04:17:21.6186948Z ---------------------------------------------------------------------- 2022-05-18T04:17:23.2010097Z test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:17:23.2403358Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25215 2022-05-18T04:17:23.2511142Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25216 2022-05-18T04:17:23.2618406Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25217 2022-05-18T04:17:23.2727900Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25218 2022-05-18T04:17:24.1314298Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0kqjm6m3 2022-05-18T04:17:24.1315240Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0kqjm6m3/_remote_module_non_scriptable.py 2022-05-18T04:17:24.1370911Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprtpd_wyc 2022-05-18T04:17:24.1374097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprtpd_wyc/_remote_module_non_scriptable.py 2022-05-18T04:17:24.1624836Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe5t_7wws 2022-05-18T04:17:24.1627432Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe5t_7wws/_remote_module_non_scriptable.py 2022-05-18T04:17:24.2017716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5sqgvx2_ 2022-05-18T04:17:24.2020426Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5sqgvx2_/_remote_module_non_scriptable.py 2022-05-18T04:17:24.5367252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:17:24.5378032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:17:24.5654272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:17:24.6007007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:17:39.9110204Z ok (18.292s) 2022-05-18T04:17:39.9110425Z 2022-05-18T04:17:39.9110939Z ---------------------------------------------------------------------- 2022-05-18T04:17:39.9111410Z Ran 1 test in 18.292s 2022-05-18T04:17:39.9111577Z 2022-05-18T04:17:39.9111670Z OK 2022-05-18T04:17:39.9111805Z 2022-05-18T04:17:39.9111920Z Generating XML reports... 2022-05-18T04:17:39.9155705Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041721.xml 2022-05-18T04:17:41.0773076Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwb6j2_yv 2022-05-18T04:17:41.0774393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwb6j2_yv/_remote_module_non_scriptable.py 2022-05-18T04:17:41.4759146Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:17:41.4772768Z 2022-05-18T04:17:41.4773184Z Running tests... 2022-05-18T04:17:41.4773623Z ---------------------------------------------------------------------- 2022-05-18T04:17:43.0323930Z test_rref_as_arg_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:17:43.0711140Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25564 2022-05-18T04:17:43.0816699Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25565 2022-05-18T04:17:43.0925278Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25566 2022-05-18T04:17:43.1035035Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25567 2022-05-18T04:17:43.9833631Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1nt_vgug 2022-05-18T04:17:43.9835094Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1nt_vgug/_remote_module_non_scriptable.py 2022-05-18T04:17:43.9859348Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9ubtt_s2 2022-05-18T04:17:43.9862328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9ubtt_s2/_remote_module_non_scriptable.py 2022-05-18T04:17:44.0385042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpebz2pfo3 2022-05-18T04:17:44.0387145Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpebz2pfo3/_remote_module_non_scriptable.py 2022-05-18T04:17:44.0397147Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp78wyeypt 2022-05-18T04:17:44.0399556Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp78wyeypt/_remote_module_non_scriptable.py 2022-05-18T04:17:44.3832399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:17:44.3906681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:17:44.4402779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:17:44.4422012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:17:57.2351057Z ok (15.757s) 2022-05-18T04:17:57.2351303Z 2022-05-18T04:17:57.2351930Z ---------------------------------------------------------------------- 2022-05-18T04:17:57.2352262Z Ran 1 test in 15.758s 2022-05-18T04:17:57.2352426Z 2022-05-18T04:17:57.2352519Z OK 2022-05-18T04:17:57.2352653Z 2022-05-18T04:17:57.2352786Z Generating XML reports... 2022-05-18T04:17:57.2395106Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041741.xml 2022-05-18T04:17:58.3848463Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpggbkpa77 2022-05-18T04:17:58.3849710Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpggbkpa77/_remote_module_non_scriptable.py 2022-05-18T04:17:58.7834969Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:17:58.7848597Z 2022-05-18T04:17:58.7848835Z Running tests... 2022-05-18T04:17:58.7849396Z ---------------------------------------------------------------------- 2022-05-18T04:18:00.3350792Z test_rref_as_arg_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:18:00.3737260Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25907 2022-05-18T04:18:00.3845547Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25908 2022-05-18T04:18:00.3955053Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25909 2022-05-18T04:18:00.4065775Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25910 2022-05-18T04:18:01.2789495Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpraf08r8_ 2022-05-18T04:18:01.2790347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpraf08r8_/_remote_module_non_scriptable.py 2022-05-18T04:18:01.2857671Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeew4xzgj 2022-05-18T04:18:01.2860641Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeew4xzgj/_remote_module_non_scriptable.py 2022-05-18T04:18:01.3147845Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpes3lyttm 2022-05-18T04:18:01.3150192Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpes3lyttm/_remote_module_non_scriptable.py 2022-05-18T04:18:01.3325304Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv1oayix_ 2022-05-18T04:18:01.3328119Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv1oayix_/_remote_module_non_scriptable.py 2022-05-18T04:18:01.6852312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:18:01.6978926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:18:01.7176681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:18:01.7366846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:18:16.6450563Z ok (17.860s) 2022-05-18T04:18:16.6450794Z 2022-05-18T04:18:16.6451217Z ---------------------------------------------------------------------- 2022-05-18T04:18:16.6451562Z Ran 1 test in 17.860s 2022-05-18T04:18:16.6451730Z 2022-05-18T04:18:16.6451825Z OK 2022-05-18T04:18:16.6451964Z 2022-05-18T04:18:16.6452102Z Generating XML reports... 2022-05-18T04:18:16.6495209Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041758.xml 2022-05-18T04:18:17.8232147Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpply0kqnl 2022-05-18T04:18:17.8233053Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpply0kqnl/_remote_module_non_scriptable.py 2022-05-18T04:18:18.2323449Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:18:18.2337953Z 2022-05-18T04:18:18.2338439Z Running tests... 2022-05-18T04:18:18.2339043Z ---------------------------------------------------------------------- 2022-05-18T04:18:19.8129661Z test_rref_as_arg_synchronization5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:18:19.8521573Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26256 2022-05-18T04:18:19.8629511Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26257 2022-05-18T04:18:19.8737166Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 26258 2022-05-18T04:18:19.8847684Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 26259 2022-05-18T04:18:20.7672397Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo1y_6k7r 2022-05-18T04:18:20.7673507Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo1y_6k7r/_remote_module_non_scriptable.py 2022-05-18T04:18:20.7676023Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmt19ulge 2022-05-18T04:18:20.7679695Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmt19ulge/_remote_module_non_scriptable.py 2022-05-18T04:18:20.7708159Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4fqkvwe6 2022-05-18T04:18:20.7711162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4fqkvwe6/_remote_module_non_scriptable.py 2022-05-18T04:18:20.8421602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvj_zytwz 2022-05-18T04:18:20.8424064Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvj_zytwz/_remote_module_non_scriptable.py 2022-05-18T04:18:21.1709067Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:18:21.1710614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:18:21.1827985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:18:21.2444503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:18:34.1170653Z ok (15.883s) 2022-05-18T04:18:34.1171050Z 2022-05-18T04:18:34.1171742Z ---------------------------------------------------------------------- 2022-05-18T04:18:34.1172340Z Ran 1 test in 15.883s 2022-05-18T04:18:34.1172649Z 2022-05-18T04:18:34.1172814Z OK 2022-05-18T04:18:34.1173065Z 2022-05-18T04:18:34.1173285Z Generating XML reports... 2022-05-18T04:18:34.1216606Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041818.xml 2022-05-18T04:18:35.2851320Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7eep_enp 2022-05-18T04:18:35.2852484Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7eep_enp/_remote_module_non_scriptable.py 2022-05-18T04:18:35.6953265Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:18:35.6967677Z 2022-05-18T04:18:35.6968173Z Running tests... 2022-05-18T04:18:35.6968676Z ---------------------------------------------------------------------- 2022-05-18T04:18:37.2806742Z test_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:18:37.3201398Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26599 2022-05-18T04:18:37.3311104Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26600 2022-05-18T04:18:37.3420954Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 26601 2022-05-18T04:18:37.3531604Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 26602 2022-05-18T04:18:38.2878809Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgto05pme 2022-05-18T04:18:38.2879786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgto05pme/_remote_module_non_scriptable.py 2022-05-18T04:18:38.3389526Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppv32wsxf 2022-05-18T04:18:38.3391595Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppv32wsxf/_remote_module_non_scriptable.py 2022-05-18T04:18:38.3448944Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnux3y99a 2022-05-18T04:18:38.3449561Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_9recws6 2022-05-18T04:18:38.3451245Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnux3y99a/_remote_module_non_scriptable.py 2022-05-18T04:18:38.3453291Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_9recws6/_remote_module_non_scriptable.py 2022-05-18T04:18:38.7030367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:18:38.7416566Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:18:38.7475099Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:18:38.7527773Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:18:50.1832135Z ok (14.486s) 2022-05-18T04:18:50.1832343Z 2022-05-18T04:18:50.1832756Z ---------------------------------------------------------------------- 2022-05-18T04:18:50.1833104Z Ran 1 test in 14.486s 2022-05-18T04:18:50.1833253Z 2022-05-18T04:18:50.1833350Z OK 2022-05-18T04:18:50.1833486Z 2022-05-18T04:18:50.1833942Z Generating XML reports... 2022-05-18T04:18:50.1877553Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041835.xml 2022-05-18T04:18:51.3339120Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmgholbm2 2022-05-18T04:18:51.3339948Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmgholbm2/_remote_module_non_scriptable.py 2022-05-18T04:18:51.7333403Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:18:51.7347684Z 2022-05-18T04:18:51.7347915Z Running tests... 2022-05-18T04:18:51.7348348Z ---------------------------------------------------------------------- 2022-05-18T04:18:53.2671040Z test_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:18:53.3057474Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26941 2022-05-18T04:18:53.3164075Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26942 2022-05-18T04:18:53.3271861Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 26943 2022-05-18T04:18:53.3378940Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 26944 2022-05-18T04:18:54.2341784Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp03ka1hwt 2022-05-18T04:18:54.2343103Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp03ka1hwt/_remote_module_non_scriptable.py 2022-05-18T04:18:54.2414042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptbyfvrfl 2022-05-18T04:18:54.2416593Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptbyfvrfl/_remote_module_non_scriptable.py 2022-05-18T04:18:54.2807882Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6snj8oi9 2022-05-18T04:18:54.2810371Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6snj8oi9/_remote_module_non_scriptable.py 2022-05-18T04:18:54.2978347Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwze1po7 2022-05-18T04:18:54.2981057Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwze1po7/_remote_module_non_scriptable.py 2022-05-18T04:18:54.6338417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:18:54.6414218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:18:54.6798966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:18:54.7044499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:19:06.4673654Z ok (14.732s) 2022-05-18T04:19:06.4674009Z 2022-05-18T04:19:06.4674711Z ---------------------------------------------------------------------- 2022-05-18T04:19:06.4675328Z Ran 1 test in 14.733s 2022-05-18T04:19:06.4675670Z 2022-05-18T04:19:06.4675846Z OK 2022-05-18T04:19:06.4676102Z 2022-05-18T04:19:06.4676301Z Generating XML reports... 2022-05-18T04:19:06.4719870Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041851.xml 2022-05-18T04:19:07.6436021Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1qteo1ey 2022-05-18T04:19:07.6437197Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1qteo1ey/_remote_module_non_scriptable.py 2022-05-18T04:19:08.0540500Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:19:08.0556689Z 2022-05-18T04:19:08.0557127Z Running tests... 2022-05-18T04:19:08.0557657Z ---------------------------------------------------------------------- 2022-05-18T04:19:09.6465446Z test_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:19:09.6861441Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27286 2022-05-18T04:19:09.6971448Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27287 2022-05-18T04:19:09.7083347Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 27288 2022-05-18T04:19:09.7193664Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 27289 2022-05-18T04:19:10.6112182Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpulmz__cp 2022-05-18T04:19:10.6113304Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpulmz__cp/_remote_module_non_scriptable.py 2022-05-18T04:19:10.6543360Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjcbamde5 2022-05-18T04:19:10.6544956Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjcbamde5/_remote_module_non_scriptable.py 2022-05-18T04:19:10.7051592Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgm14mgf9 2022-05-18T04:19:10.7053979Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgm14mgf9/_remote_module_non_scriptable.py 2022-05-18T04:19:10.7060582Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp49or8h43 2022-05-18T04:19:10.7063656Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp49or8h43/_remote_module_non_scriptable.py 2022-05-18T04:19:11.0080811Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:19:11.0578094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:19:11.1178189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:19:11.1206398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:19:22.9498973Z ok (14.894s) 2022-05-18T04:19:22.9499207Z 2022-05-18T04:19:22.9499625Z ---------------------------------------------------------------------- 2022-05-18T04:19:22.9499972Z Ran 1 test in 14.894s 2022-05-18T04:19:22.9500141Z 2022-05-18T04:19:22.9500237Z OK 2022-05-18T04:19:22.9500354Z 2022-05-18T04:19:22.9500496Z Generating XML reports... 2022-05-18T04:19:22.9543323Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041908.xml 2022-05-18T04:19:24.1181304Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp40cm39q5 2022-05-18T04:19:24.1182172Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp40cm39q5/_remote_module_non_scriptable.py 2022-05-18T04:19:24.5264919Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:19:24.5280186Z 2022-05-18T04:19:24.5280680Z Running tests... 2022-05-18T04:19:24.5281161Z ---------------------------------------------------------------------- 2022-05-18T04:19:26.1056324Z test_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:19:26.1449745Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27631 2022-05-18T04:19:26.1557068Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27632 2022-05-18T04:19:26.1665827Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 27633 2022-05-18T04:19:26.1775835Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 27634 2022-05-18T04:19:27.0673768Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp31_1omzu 2022-05-18T04:19:27.0674959Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp31_1omzu/_remote_module_non_scriptable.py 2022-05-18T04:19:27.0849045Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeuqvwnxh 2022-05-18T04:19:27.0851324Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeuqvwnxh/_remote_module_non_scriptable.py 2022-05-18T04:19:27.0959766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb5p8eva4 2022-05-18T04:19:27.0961855Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb5p8eva4/_remote_module_non_scriptable.py 2022-05-18T04:19:27.1126361Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4aenova3 2022-05-18T04:19:27.1129362Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4aenova3/_remote_module_non_scriptable.py 2022-05-18T04:19:27.4709599Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:19:27.4923317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:19:27.4978415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:19:27.5232172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:19:39.2098063Z ok (14.681s) 2022-05-18T04:19:39.2098286Z 2022-05-18T04:19:39.2098677Z ---------------------------------------------------------------------- 2022-05-18T04:19:39.2099029Z Ran 1 test in 14.682s 2022-05-18T04:19:39.2099206Z 2022-05-18T04:19:39.2099301Z OK 2022-05-18T04:19:39.2099437Z 2022-05-18T04:19:39.2099574Z Generating XML reports... 2022-05-18T04:19:39.2142592Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041924.xml 2022-05-18T04:19:40.3747300Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsws8kfe7 2022-05-18T04:19:40.3748617Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsws8kfe7/_remote_module_non_scriptable.py 2022-05-18T04:19:40.7857983Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:19:40.7872989Z 2022-05-18T04:19:40.7873377Z Running tests... 2022-05-18T04:19:40.7873815Z ---------------------------------------------------------------------- 2022-05-18T04:19:42.3742989Z test_rref_to_here_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:19:42.4136190Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27973 2022-05-18T04:19:42.4244386Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27974 2022-05-18T04:19:42.4353914Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 27975 2022-05-18T04:19:42.4464163Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 27976 2022-05-18T04:19:43.3483716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxz1k51sl 2022-05-18T04:19:43.3484606Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxz1k51sl/_remote_module_non_scriptable.py 2022-05-18T04:19:43.3742016Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3lm0udg9 2022-05-18T04:19:43.3745049Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3lm0udg9/_remote_module_non_scriptable.py 2022-05-18T04:19:43.3749866Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3pk05hbk 2022-05-18T04:19:43.3752814Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3pk05hbk/_remote_module_non_scriptable.py 2022-05-18T04:19:43.3867553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp23zu5qp2 2022-05-18T04:19:43.3870412Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp23zu5qp2/_remote_module_non_scriptable.py 2022-05-18T04:19:43.7454804Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:19:43.7750819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:19:43.7794262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:19:43.7955988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:19:56.6785574Z ok (15.891s) 2022-05-18T04:19:56.6785799Z 2022-05-18T04:19:56.6786462Z ---------------------------------------------------------------------- 2022-05-18T04:19:56.6786827Z Ran 1 test in 15.891s 2022-05-18T04:19:56.6787024Z 2022-05-18T04:19:56.6787129Z OK 2022-05-18T04:19:56.6787265Z 2022-05-18T04:19:56.6787404Z Generating XML reports... 2022-05-18T04:19:56.6830395Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041940.xml 2022-05-18T04:19:57.8500908Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpavagx223 2022-05-18T04:19:57.8502202Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpavagx223/_remote_module_non_scriptable.py 2022-05-18T04:19:58.2598694Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:19:58.2612768Z 2022-05-18T04:19:58.2613186Z Running tests... 2022-05-18T04:19:58.2614130Z ---------------------------------------------------------------------- 2022-05-18T04:19:59.8415167Z test_rref_to_here_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:19:59.8809082Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28316 2022-05-18T04:19:59.8918539Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28317 2022-05-18T04:19:59.9028827Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 28318 2022-05-18T04:19:59.9138373Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 28319 2022-05-18T04:20:00.8040430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphsze_ed5 2022-05-18T04:20:00.8041627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphsze_ed5/_remote_module_non_scriptable.py 2022-05-18T04:20:00.8507120Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkgcy1t1g 2022-05-18T04:20:00.8509196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkgcy1t1g/_remote_module_non_scriptable.py 2022-05-18T04:20:00.8738606Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvppf7qfw 2022-05-18T04:20:00.8740856Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvppf7qfw/_remote_module_non_scriptable.py 2022-05-18T04:20:00.8847608Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplgozo_0l 2022-05-18T04:20:00.8849643Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplgozo_0l/_remote_module_non_scriptable.py 2022-05-18T04:20:01.2162148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:20:01.2500340Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:20:01.2810687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:20:01.2822922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:20:16.2508997Z ok (17.989s) 2022-05-18T04:20:16.2509226Z 2022-05-18T04:20:16.2509631Z ---------------------------------------------------------------------- 2022-05-18T04:20:16.2510000Z Ran 1 test in 17.990s 2022-05-18T04:20:16.2511752Z 2022-05-18T04:20:16.2512093Z OK 2022-05-18T04:20:16.2512259Z 2022-05-18T04:20:16.2512411Z Generating XML reports... 2022-05-18T04:20:16.2553457Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041958.xml 2022-05-18T04:20:17.4171830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfkskbhc1 2022-05-18T04:20:17.4173287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfkskbhc1/_remote_module_non_scriptable.py 2022-05-18T04:20:17.8259138Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:20:17.8273904Z 2022-05-18T04:20:17.8274589Z Running tests... 2022-05-18T04:20:17.8275089Z ---------------------------------------------------------------------- 2022-05-18T04:20:19.4011501Z test_rref_to_here_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:20:19.4397622Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28665 2022-05-18T04:20:19.4504455Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28666 2022-05-18T04:20:19.4613458Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 28667 2022-05-18T04:20:19.4722899Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 28668 2022-05-18T04:20:20.3776264Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf21riccs 2022-05-18T04:20:20.3776888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf21riccs/_remote_module_non_scriptable.py 2022-05-18T04:20:20.4368938Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl59nics8 2022-05-18T04:20:20.4371364Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl59nics8/_remote_module_non_scriptable.py 2022-05-18T04:20:20.4441792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp8tiyrd_ 2022-05-18T04:20:20.4444632Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp8tiyrd_/_remote_module_non_scriptable.py 2022-05-18T04:20:20.4692133Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_jqti14j 2022-05-18T04:20:20.4695049Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_jqti14j/_remote_module_non_scriptable.py 2022-05-18T04:20:20.7715147Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:20:20.8479798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:20:20.8516641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:20:20.8785412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:20:33.8042713Z ok (15.977s) 2022-05-18T04:20:33.8042940Z 2022-05-18T04:20:33.8043351Z ---------------------------------------------------------------------- 2022-05-18T04:20:33.8043844Z Ran 1 test in 15.977s 2022-05-18T04:20:33.8044118Z 2022-05-18T04:20:33.8044281Z OK 2022-05-18T04:20:33.8045325Z 2022-05-18T04:20:33.8045609Z Generating XML reports... 2022-05-18T04:20:33.8086925Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042017.xml 2022-05-18T04:20:34.9856221Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfwq4818w 2022-05-18T04:20:34.9857493Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfwq4818w/_remote_module_non_scriptable.py 2022-05-18T04:20:35.3928660Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:20:35.3944032Z 2022-05-18T04:20:35.3944448Z Running tests... 2022-05-18T04:20:35.3944973Z ---------------------------------------------------------------------- 2022-05-18T04:20:36.9802895Z test_rref_to_here_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:20:37.0198801Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29008 2022-05-18T04:20:37.0307924Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29009 2022-05-18T04:20:37.0417785Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 29010 2022-05-18T04:20:37.0529044Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 29011 2022-05-18T04:20:37.9582888Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplc7u9rae 2022-05-18T04:20:37.9584305Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplc7u9rae/_remote_module_non_scriptable.py 2022-05-18T04:20:37.9794030Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_b9p8lei 2022-05-18T04:20:37.9796481Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_b9p8lei/_remote_module_non_scriptable.py 2022-05-18T04:20:37.9920594Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa0nzc50q 2022-05-18T04:20:37.9923400Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa0nzc50q/_remote_module_non_scriptable.py 2022-05-18T04:20:37.9994964Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpttjjw8c8 2022-05-18T04:20:37.9997201Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpttjjw8c8/_remote_module_non_scriptable.py 2022-05-18T04:20:38.3585119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:20:38.3822209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:20:38.3997365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:20:38.4076788Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:20:53.4900504Z ok (18.095s) 2022-05-18T04:20:53.4900737Z 2022-05-18T04:20:53.4901223Z ---------------------------------------------------------------------- 2022-05-18T04:20:53.4901739Z Ran 1 test in 18.096s 2022-05-18T04:20:53.4901912Z 2022-05-18T04:20:53.4902021Z OK 2022-05-18T04:20:53.4902158Z 2022-05-18T04:20:53.4905581Z Generating XML reports... 2022-05-18T04:20:53.4945520Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042035.xml 2022-05-18T04:20:54.6629200Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxge3s028 2022-05-18T04:20:54.6630024Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxge3s028/_remote_module_non_scriptable.py 2022-05-18T04:20:55.0719338Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:20:55.0733605Z 2022-05-18T04:20:55.0733913Z Running tests... 2022-05-18T04:20:55.0734400Z ---------------------------------------------------------------------- 2022-05-18T04:20:56.6477408Z test_rref_with_unpickleable_attributes (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:20:56.6874208Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29357 2022-05-18T04:20:56.6981985Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29358 2022-05-18T04:20:56.7091286Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 29359 2022-05-18T04:20:56.7202396Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 29360 2022-05-18T04:20:57.5910500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbrcpj5gq 2022-05-18T04:20:57.5911590Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1vvgjj97 2022-05-18T04:20:57.5912664Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbrcpj5gq/_remote_module_non_scriptable.py 2022-05-18T04:20:57.5913749Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1vvgjj97/_remote_module_non_scriptable.py 2022-05-18T04:20:57.5984862Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpynz932m8 2022-05-18T04:20:57.5987378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpynz932m8/_remote_module_non_scriptable.py 2022-05-18T04:20:57.6464401Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp46_o18i0 2022-05-18T04:20:57.6467317Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp46_o18i0/_remote_module_non_scriptable.py 2022-05-18T04:20:57.9956657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:20:57.9968239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:20:57.9981975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:20:58.0575699Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:01.4332760Z ok (6.360s) 2022-05-18T04:21:01.4332990Z 2022-05-18T04:21:01.4333401Z ---------------------------------------------------------------------- 2022-05-18T04:21:01.4333730Z Ran 1 test in 6.360s 2022-05-18T04:21:01.4333901Z 2022-05-18T04:21:01.4334000Z OK 2022-05-18T04:21:01.4334136Z 2022-05-18T04:21:01.4334272Z Generating XML reports... 2022-05-18T04:21:01.4378437Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042055.xml 2022-05-18T04:21:02.6183583Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa_spv7l8 2022-05-18T04:21:02.6184761Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa_spv7l8/_remote_module_non_scriptable.py 2022-05-18T04:21:03.0265370Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:03.0280149Z 2022-05-18T04:21:03.0280404Z Running tests... 2022-05-18T04:21:03.0280839Z ---------------------------------------------------------------------- 2022-05-18T04:21:04.6115414Z test_tensor_view_as_return_value (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:04.6501201Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29704 2022-05-18T04:21:04.6610672Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29705 2022-05-18T04:21:04.6721442Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 29706 2022-05-18T04:21:04.6835342Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 29707 2022-05-18T04:21:05.6298796Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz0uokp6z 2022-05-18T04:21:05.6299438Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz0uokp6z/_remote_module_non_scriptable.py 2022-05-18T04:21:05.6565996Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpud1zibtt 2022-05-18T04:21:05.6568350Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpud1zibtt/_remote_module_non_scriptable.py 2022-05-18T04:21:05.7038037Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqhf1zo1o 2022-05-18T04:21:05.7040850Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqhf1zo1o/_remote_module_non_scriptable.py 2022-05-18T04:21:05.7052596Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkcd888di 2022-05-18T04:21:05.7055515Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkcd888di/_remote_module_non_scriptable.py 2022-05-18T04:21:06.0278749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:06.0562875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:06.1078958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:06.1079515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:11.6014377Z ok (8.573s) 2022-05-18T04:21:11.6014735Z 2022-05-18T04:21:11.6015277Z ---------------------------------------------------------------------- 2022-05-18T04:21:11.6015605Z Ran 1 test in 8.573s 2022-05-18T04:21:11.6015783Z 2022-05-18T04:21:11.6015878Z OK 2022-05-18T04:21:11.6016351Z 2022-05-18T04:21:11.6016504Z Generating XML reports... 2022-05-18T04:21:11.6060188Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042103.xml 2022-05-18T04:21:12.7667454Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxj4ihm37 2022-05-18T04:21:12.7668672Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxj4ihm37/_remote_module_non_scriptable.py 2022-05-18T04:21:13.1640608Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:13.1654488Z 2022-05-18T04:21:13.1654767Z Running tests... 2022-05-18T04:21:13.1655192Z ---------------------------------------------------------------------- 2022-05-18T04:21:14.7250903Z test_device_maps_backward_pass (__main__.TensorPipeTensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:14.7637657Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30347 2022-05-18T04:21:14.7745437Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30348 2022-05-18T04:21:14.7852725Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 30349 2022-05-18T04:21:14.7962608Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 30350 2022-05-18T04:21:15.7116206Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfbgx1vxz 2022-05-18T04:21:15.7117390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfbgx1vxz/_remote_module_non_scriptable.py 2022-05-18T04:21:15.7635474Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2vie_rck 2022-05-18T04:21:15.7636970Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2vie_rck/_remote_module_non_scriptable.py 2022-05-18T04:21:15.7645733Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzucdnz15 2022-05-18T04:21:15.7649190Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzucdnz15/_remote_module_non_scriptable.py 2022-05-18T04:21:15.8067861Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpunzv3170 2022-05-18T04:21:15.8069784Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpunzv3170/_remote_module_non_scriptable.py 2022-05-18T04:21:16.1089884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:16.1703070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:16.1711118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:16.2126515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:16.4022755Z skip: Need at least 4 CUDA devices (3.236s) 2022-05-18T04:21:16.4023005Z 2022-05-18T04:21:16.4023389Z ---------------------------------------------------------------------- 2022-05-18T04:21:16.4023956Z Ran 1 test in 3.237s 2022-05-18T04:21:16.4024123Z 2022-05-18T04:21:16.4024242Z OK (skipped=1) 2022-05-18T04:21:16.4024399Z 2022-05-18T04:21:16.4024525Z Generating XML reports... 2022-05-18T04:21:16.4067492Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518042113.xml 2022-05-18T04:21:17.5713381Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp167pj888 2022-05-18T04:21:17.5714836Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp167pj888/_remote_module_non_scriptable.py 2022-05-18T04:21:17.9833160Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:17.9847977Z 2022-05-18T04:21:17.9848119Z Running tests... 2022-05-18T04:21:17.9848556Z ---------------------------------------------------------------------- 2022-05-18T04:21:19.5631469Z test_dist_autograd_sync_streams (__main__.TensorPipeTensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:19.6025707Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30518 2022-05-18T04:21:19.6134951Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30519 2022-05-18T04:21:19.6243605Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 30520 2022-05-18T04:21:19.6354694Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 30521 2022-05-18T04:21:20.5716571Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj__jxadv 2022-05-18T04:21:20.5717660Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj__jxadv/_remote_module_non_scriptable.py 2022-05-18T04:21:20.5779598Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppo7no_c_ 2022-05-18T04:21:20.5782459Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppo7no_c_/_remote_module_non_scriptable.py 2022-05-18T04:21:20.5824904Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_d94fzzt 2022-05-18T04:21:20.5827884Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_d94fzzt/_remote_module_non_scriptable.py 2022-05-18T04:21:20.5872821Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprbudnrux 2022-05-18T04:21:20.5875748Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprbudnrux/_remote_module_non_scriptable.py 2022-05-18T04:21:20.9768722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:20.9870373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:20.9905525Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:20.9920235Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:21.2411331Z skip: Need at least 4 CUDA devices (3.256s) 2022-05-18T04:21:21.2411662Z 2022-05-18T04:21:21.2412190Z ---------------------------------------------------------------------- 2022-05-18T04:21:21.2412537Z Ran 1 test in 3.256s 2022-05-18T04:21:21.2412700Z 2022-05-18T04:21:21.2412793Z OK (skipped=1) 2022-05-18T04:21:21.2412947Z 2022-05-18T04:21:21.2413083Z Generating XML reports... 2022-05-18T04:21:21.2455849Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518042117.xml 2022-05-18T04:21:22.4105022Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeai13cc0 2022-05-18T04:21:22.4106946Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeai13cc0/_remote_module_non_scriptable.py 2022-05-18T04:21:22.8203385Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:22.8218362Z 2022-05-18T04:21:22.8218529Z Running tests... 2022-05-18T04:21:22.8219217Z ---------------------------------------------------------------------- 2022-05-18T04:21:24.4100502Z test_gradients_synchronizations (__main__.TensorPipeTensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:24.4495737Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30689 2022-05-18T04:21:24.4607438Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30690 2022-05-18T04:21:24.4717620Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 30691 2022-05-18T04:21:24.4828593Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 30692 2022-05-18T04:21:25.4240770Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzjcfisxt 2022-05-18T04:21:25.4241690Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzjcfisxt/_remote_module_non_scriptable.py 2022-05-18T04:21:25.4459971Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbt0hrq8i 2022-05-18T04:21:25.4462865Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbt0hrq8i/_remote_module_non_scriptable.py 2022-05-18T04:21:25.4655917Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpce2b0ir9 2022-05-18T04:21:25.4658819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpce2b0ir9/_remote_module_non_scriptable.py 2022-05-18T04:21:25.4875262Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkrhydmq8 2022-05-18T04:21:25.4877770Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkrhydmq8/_remote_module_non_scriptable.py 2022-05-18T04:21:25.8270214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:25.8688104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:25.8820043Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:25.8929142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:26.0885875Z skip: Need at least 4 CUDA devices (3.266s) 2022-05-18T04:21:26.0886247Z 2022-05-18T04:21:26.0886678Z ---------------------------------------------------------------------- 2022-05-18T04:21:26.0887336Z Ran 1 test in 3.267s 2022-05-18T04:21:26.0887506Z 2022-05-18T04:21:26.0887620Z OK (skipped=1) 2022-05-18T04:21:26.0887780Z 2022-05-18T04:21:26.0887888Z Generating XML reports... 2022-05-18T04:21:26.0930346Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518042122.xml 2022-05-18T04:21:26.5124432Z Running distributed/fsdp/test_fsdp_core ... [2022-05-18 04:21:26.511909] 2022-05-18T04:21:26.5125160Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_core.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:21:26.512022] 2022-05-18T04:21:27.4019196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_upt2fns 2022-05-18T04:21:27.4021060Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_upt2fns/_remote_module_non_scriptable.py 2022-05-18T04:21:27.4288793Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_core 2022-05-18T04:21:27.4337636Z 2022-05-18T04:21:27.4337900Z Running tests... 2022-05-18T04:21:27.4338349Z ---------------------------------------------------------------------- 2022-05-18T04:21:28.9791263Z test_backward_hooks_after_save (__main__.TestHooks) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:29.0178266Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30860 2022-05-18T04:21:29.0287933Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30861 2022-05-18T04:21:29.9196018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfuam6gzt 2022-05-18T04:21:29.9197225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfuam6gzt/_remote_module_non_scriptable.py 2022-05-18T04:21:29.9420260Z dist init r=0, world=2 2022-05-18T04:21:29.9424956Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:29.9574021Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6s927__k 2022-05-18T04:21:29.9576814Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6s927__k/_remote_module_non_scriptable.py 2022-05-18T04:21:29.9790017Z dist init r=1, world=2 2022-05-18T04:21:29.9794471Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:29.9795446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:29.9833683Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:31.3383909Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:31.3384484Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:31.3697401Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:31.3698075Z warnings.warn( 2022-05-18T04:21:31.3733881Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:31.3734429Z warnings.warn( 2022-05-18T04:21:32.5384003Z ok (5.104s) 2022-05-18T04:21:32.5527344Z test_output_backward_hooks_cuda_first_False (__main__.TestHooks) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30943 2022-05-18T04:21:32.5633228Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30944 2022-05-18T04:21:33.4535933Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5mzjliqc 2022-05-18T04:21:33.4536753Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgyl68rx8 2022-05-18T04:21:33.4537320Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5mzjliqc/_remote_module_non_scriptable.py 2022-05-18T04:21:33.4538127Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgyl68rx8/_remote_module_non_scriptable.py 2022-05-18T04:21:33.4752951Z dist init r=0, world=2 2022-05-18T04:21:33.4757365Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:33.4762779Z dist init r=1, world=2 2022-05-18T04:21:33.4767124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:33.4768164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:33.4860994Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:34.8253902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:34.8254439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:34.8538835Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:34.8539419Z warnings.warn( 2022-05-18T04:21:34.8573606Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:34.8574162Z warnings.warn( 2022-05-18T04:21:35.9723321Z ok (3.434s) 2022-05-18T04:21:35.9858769Z test_output_backward_hooks_cuda_first_True (__main__.TestHooks) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31026 2022-05-18T04:21:35.9967798Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31027 2022-05-18T04:21:36.8972411Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjqvmv5rb 2022-05-18T04:21:36.8973343Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjqvmv5rb/_remote_module_non_scriptable.py 2022-05-18T04:21:36.9054011Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxo7r_k94 2022-05-18T04:21:36.9056798Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxo7r_k94/_remote_module_non_scriptable.py 2022-05-18T04:21:36.9189718Z dist init r=0, world=2 2022-05-18T04:21:36.9193816Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:36.9278639Z dist init r=1, world=2 2022-05-18T04:21:36.9283060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:36.9283988Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:36.9297054Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:38.2676913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:38.2677467Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:39.4057114Z ok (3.433s) 2022-05-18T04:21:39.4073748Z test_register_functions_called_cuda_first_False_mixed_precision_False (__main__.TestHooks) 2022-05-18T04:21:39.4202343Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31109 2022-05-18T04:21:39.4308044Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31110 2022-05-18T04:21:40.3314834Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1ej9zne1 2022-05-18T04:21:40.3315741Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1ej9zne1/_remote_module_non_scriptable.py 2022-05-18T04:21:40.3339771Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc5f0a3g1 2022-05-18T04:21:40.3342532Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc5f0a3g1/_remote_module_non_scriptable.py 2022-05-18T04:21:40.3532838Z dist init r=1, world=2 2022-05-18T04:21:40.3537237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:40.3562907Z dist init r=0, world=2 2022-05-18T04:21:40.3567083Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:40.3568418Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:40.3640689Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:41.7143416Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:41.7144380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:41.7455935Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:41.7456498Z warnings.warn( 2022-05-18T04:21:41.7458733Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:41.7459280Z warnings.warn( 2022-05-18T04:21:42.8411529Z ok (3.435s) 2022-05-18T04:21:42.8424123Z test_register_functions_called_cuda_first_False_mixed_precision_True (__main__.TestHooks) 2022-05-18T04:21:42.8549527Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31188 2022-05-18T04:21:42.8655437Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31189 2022-05-18T04:21:43.7559876Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgusq_7ds 2022-05-18T04:21:43.7561327Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgusq_7ds/_remote_module_non_scriptable.py 2022-05-18T04:21:43.7631722Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8t6godla 2022-05-18T04:21:43.7634011Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8t6godla/_remote_module_non_scriptable.py 2022-05-18T04:21:43.7782843Z dist init r=0, world=2 2022-05-18T04:21:43.7787472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:43.7856894Z dist init r=1, world=2 2022-05-18T04:21:43.7861135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:43.7862226Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:43.7890743Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:45.1236651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:45.1237183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:45.1536983Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:45.1537945Z warnings.warn( 2022-05-18T04:21:45.1539026Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:45.1539550Z warnings.warn( 2022-05-18T04:21:46.2744753Z ok (3.433s) 2022-05-18T04:21:46.2757889Z test_register_functions_called_cuda_first_True_mixed_precision_False (__main__.TestHooks) 2022-05-18T04:21:46.2881804Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31267 2022-05-18T04:21:46.2988074Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31268 2022-05-18T04:21:47.2339313Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkgm5irtp 2022-05-18T04:21:47.2340687Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkgm5irtp/_remote_module_non_scriptable.py 2022-05-18T04:21:47.2449421Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjbl48aow 2022-05-18T04:21:47.2452206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjbl48aow/_remote_module_non_scriptable.py 2022-05-18T04:21:47.2558601Z dist init r=1, world=2 2022-05-18T04:21:47.2562725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:47.2675700Z dist init r=0, world=2 2022-05-18T04:21:47.2680092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:47.2681065Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:47.2768006Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:48.6236425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:48.6236985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:49.7076017Z ok (3.433s) 2022-05-18T04:21:49.7088393Z test_register_functions_called_cuda_first_True_mixed_precision_True (__main__.TestHooks) 2022-05-18T04:21:49.7213558Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31346 2022-05-18T04:21:49.7319538Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31347 2022-05-18T04:21:50.6622128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp28sqfi29 2022-05-18T04:21:50.6623341Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp28sqfi29/_remote_module_non_scriptable.py 2022-05-18T04:21:50.6729295Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_39f389o 2022-05-18T04:21:50.6732041Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_39f389o/_remote_module_non_scriptable.py 2022-05-18T04:21:50.6838015Z dist init r=0, world=2 2022-05-18T04:21:50.6842256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:50.6953504Z dist init r=1, world=2 2022-05-18T04:21:50.6957619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:50.6958665Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:50.7047589Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:52.0463338Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:52.0463886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:53.1413138Z ok (3.434s) 2022-05-18T04:21:53.1550518Z test_transformer_no_grad_mixed_precision_False (__main__.TestNoGrad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31425 2022-05-18T04:21:53.1653348Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31426 2022-05-18T04:21:54.0467306Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzq4iutzp 2022-05-18T04:21:54.0468568Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzq4iutzp/_remote_module_non_scriptable.py 2022-05-18T04:21:54.0685795Z dist init r=1, world=2 2022-05-18T04:21:54.0690426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:54.0883539Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3j8fxlj0 2022-05-18T04:21:54.0886393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3j8fxlj0/_remote_module_non_scriptable.py 2022-05-18T04:21:54.1098465Z dist init r=0, world=2 2022-05-18T04:21:54.1102871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:54.1104705Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:54.1200933Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:55.4267805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:55.4268526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:55.4575734Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:55.4576633Z warnings.warn( 2022-05-18T04:21:55.4578187Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:55.4578757Z warnings.warn( 2022-05-18T04:21:56.5739805Z ok (3.432s) 2022-05-18T04:21:56.5874064Z test_transformer_no_grad_mixed_precision_True (__main__.TestNoGrad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31508 2022-05-18T04:21:56.5978250Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31509 2022-05-18T04:21:57.4866629Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2tjap6b7 2022-05-18T04:21:57.4867983Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2tjap6b7/_remote_module_non_scriptable.py 2022-05-18T04:21:57.5089624Z dist init r=1, world=2 2022-05-18T04:21:57.5094492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:21:57.5263665Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbckwmx95 2022-05-18T04:21:57.5266607Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbckwmx95/_remote_module_non_scriptable.py 2022-05-18T04:21:57.5479303Z dist init r=0, world=2 2022-05-18T04:21:57.5483370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:21:57.5484408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:57.5502833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:21:58.8876346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:58.8877887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:58.9177935Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:58.9179055Z warnings.warn( 2022-05-18T04:21:58.9180568Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:21:58.9181584Z warnings.warn( 2022-05-18T04:21:59.4052863Z ok (2.831s) 2022-05-18T04:21:59.4189970Z test_param_change_after_init_mixed_precision_False (__main__.TestParamInit) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31591 2022-05-18T04:21:59.4295732Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31592 2022-05-18T04:22:00.3220890Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ithlnvf 2022-05-18T04:22:00.3221749Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ithlnvf/_remote_module_non_scriptable.py 2022-05-18T04:22:00.3238863Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3g0bfet1 2022-05-18T04:22:00.3241745Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3g0bfet1/_remote_module_non_scriptable.py 2022-05-18T04:22:00.3437416Z dist init r=1, world=2 2022-05-18T04:22:00.3441775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:00.3465733Z dist init r=0, world=2 2022-05-18T04:22:00.3470493Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:00.3471810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:00.3545670Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:01.7020624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:01.7336210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:01.7337583Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:01.7338176Z warnings.warn( 2022-05-18T04:22:01.7338918Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:01.7339469Z warnings.warn( 2022-05-18T04:22:02.8384641Z ok (3.433s) 2022-05-18T04:22:02.8523156Z test_param_change_after_init_mixed_precision_True (__main__.TestParamInit) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31670 2022-05-18T04:22:02.8630240Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31671 2022-05-18T04:22:03.7531553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqp5xmb7y 2022-05-18T04:22:03.7532671Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqp5xmb7y/_remote_module_non_scriptable.py 2022-05-18T04:22:03.7757106Z dist init r=1, world=2 2022-05-18T04:22:03.7761727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:03.7836839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_vivh6xz 2022-05-18T04:22:03.7839257Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_vivh6xz/_remote_module_non_scriptable.py 2022-05-18T04:22:03.8051089Z dist init r=0, world=2 2022-05-18T04:22:03.8055229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:03.8056012Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:03.8067640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:05.1594631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:05.1595168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:05.1895530Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:05.1896124Z warnings.warn( 2022-05-18T04:22:05.1896866Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:05.1897412Z warnings.warn( 2022-05-18T04:22:06.2719250Z ok (3.433s) 2022-05-18T04:22:06.2858982Z test_delayed_optim_step_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31749 2022-05-18T04:22:06.2962910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31750 2022-05-18T04:22:07.1933399Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphfo24axk 2022-05-18T04:22:07.1934409Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphfo24axk/_remote_module_non_scriptable.py 2022-05-18T04:22:07.1959370Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdnapfnnn 2022-05-18T04:22:07.1962292Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdnapfnnn/_remote_module_non_scriptable.py 2022-05-18T04:22:07.2149868Z dist init r=1, world=2 2022-05-18T04:22:07.2153887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:07.2185064Z dist init r=0, world=2 2022-05-18T04:22:07.2189905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:07.2191081Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:07.2257559Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:08.5783634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:08.5784681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:09.0319625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:09.0328724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:09.0353561Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:09.0354138Z warnings.warn( 2022-05-18T04:22:09.0361610Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:09.0362151Z warnings.warn( 2022-05-18T04:22:09.6367497Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:09.6368549Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:09.6369558Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:09.6370224Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:09.7943447Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:09.7943955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:10.6065629Z ok (4.334s) 2022-05-18T04:22:10.6195800Z test_delayed_optim_step_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31832 2022-05-18T04:22:10.6300790Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31833 2022-05-18T04:22:11.5459697Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgsb3l4t1 2022-05-18T04:22:11.5460584Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgsb3l4t1/_remote_module_non_scriptable.py 2022-05-18T04:22:11.5598772Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_5ty_qe5 2022-05-18T04:22:11.5601442Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_5ty_qe5/_remote_module_non_scriptable.py 2022-05-18T04:22:11.5684273Z dist init r=0, world=2 2022-05-18T04:22:11.5688955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:11.5815785Z dist init r=1, world=2 2022-05-18T04:22:11.5819670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:11.5820592Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:11.5894565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:12.9310684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:12.9311528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:13.3856676Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:13.3865586Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:13.3889954Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:13.3890547Z warnings.warn( 2022-05-18T04:22:13.3898978Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:13.3899510Z warnings.warn( 2022-05-18T04:22:14.1714286Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:14.1714989Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:14.1716669Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:14.1717592Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:14.4272317Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:14.4272868Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:15.5413470Z ok (4.935s) 2022-05-18T04:22:15.5546570Z test_delayed_optim_step_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31915 2022-05-18T04:22:15.5651440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31916 2022-05-18T04:22:16.4559553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph2fkdmxr 2022-05-18T04:22:16.4560705Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph2fkdmxr/_remote_module_non_scriptable.py 2022-05-18T04:22:16.4610830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn7r0lkg0 2022-05-18T04:22:16.4613648Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn7r0lkg0/_remote_module_non_scriptable.py 2022-05-18T04:22:16.4773875Z dist init r=0, world=2 2022-05-18T04:22:16.4778017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:16.4837210Z dist init r=1, world=2 2022-05-18T04:22:16.4841553Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:16.4842808Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:16.4882122Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:17.8153799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:17.8154331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:18.2667625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:18.2668131Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:18.2700060Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:18.2700645Z warnings.warn( 2022-05-18T04:22:18.2701414Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:18.2701965Z warnings.warn( 2022-05-18T04:22:19.0505459Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:19.0506161Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:19.0507624Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:19.0508297Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:19.3060508Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:19.3061233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:20.3765461Z ok (4.835s) 2022-05-18T04:22:20.3898423Z test_delayed_optim_step_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31998 2022-05-18T04:22:20.4004286Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31999 2022-05-18T04:22:21.3008697Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6xxockvo 2022-05-18T04:22:21.3009556Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6xxockvo/_remote_module_non_scriptable.py 2022-05-18T04:22:21.3032177Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppljpg365 2022-05-18T04:22:21.3035130Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppljpg365/_remote_module_non_scriptable.py 2022-05-18T04:22:21.3222541Z dist init r=0, world=2 2022-05-18T04:22:21.3227035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:21.3258132Z dist init r=1, world=2 2022-05-18T04:22:21.3262602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:21.3264158Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:21.3330618Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:22.6606823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:22.6607356Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:23.1166621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:23.1167132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:23.1199574Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:23.1200234Z warnings.warn( 2022-05-18T04:22:23.1201221Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:23.1201787Z warnings.warn( 2022-05-18T04:22:23.8901360Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:23.8902107Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:23.8903041Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:23.8903906Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:24.1454206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:24.1454712Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:25.2116159Z ok (4.835s) 2022-05-18T04:22:25.2250415Z test_delayed_optim_step_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32081 2022-05-18T04:22:25.2352926Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32082 2022-05-18T04:22:26.1291791Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9xi9clv4 2022-05-18T04:22:26.1292900Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9xi9clv4/_remote_module_non_scriptable.py 2022-05-18T04:22:26.1512581Z dist init r=1, world=2 2022-05-18T04:22:26.1516702Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:26.1583196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpowst6ip9 2022-05-18T04:22:26.1586214Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpowst6ip9/_remote_module_non_scriptable.py 2022-05-18T04:22:26.1798814Z dist init r=0, world=2 2022-05-18T04:22:26.1802754Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:26.1803845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:26.1823262Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:27.5013643Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:27.5014444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:27.9551948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:27.9563103Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:27.9586045Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:27.9586627Z warnings.warn( 2022-05-18T04:22:27.9597308Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:27.9597894Z warnings.warn( 2022-05-18T04:22:28.7410068Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:28.7411000Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:28.7412771Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:28.7413429Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:28.9969490Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:28.9970017Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:30.0464433Z ok (4.835s) 2022-05-18T04:22:30.0598511Z test_delayed_optim_step_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32164 2022-05-18T04:22:30.0703375Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32165 2022-05-18T04:22:30.9806405Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_lwl90mi 2022-05-18T04:22:30.9808819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_lwl90mi/_remote_module_non_scriptable.py 2022-05-18T04:22:31.0028614Z dist init r=0, world=2 2022-05-18T04:22:31.0032861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:31.0074301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy5qc2uo7 2022-05-18T04:22:31.0077312Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy5qc2uo7/_remote_module_non_scriptable.py 2022-05-18T04:22:31.0298997Z dist init r=1, world=2 2022-05-18T04:22:31.0303571Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:31.0305219Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:31.0339738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:32.3575792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:32.3576516Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:32.8051230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:32.8052142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:32.8084627Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:32.8085253Z warnings.warn( 2022-05-18T04:22:32.8086021Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:32.8086559Z warnings.warn( 2022-05-18T04:22:33.5886927Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:33.5887648Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:33.5888568Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:33.5889243Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:33.8439057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:33.8439617Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:34.8814915Z ok (4.835s) 2022-05-18T04:22:34.8946287Z test_delayed_optim_step_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32247 2022-05-18T04:22:34.9051426Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32248 2022-05-18T04:22:35.8099058Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzhtybl45 2022-05-18T04:22:35.8100146Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzhtybl45/_remote_module_non_scriptable.py 2022-05-18T04:22:35.8317663Z dist init r=0, world=2 2022-05-18T04:22:35.8321923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:35.8338059Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7if1cw6w 2022-05-18T04:22:35.8340665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7if1cw6w/_remote_module_non_scriptable.py 2022-05-18T04:22:35.8554367Z dist init r=1, world=2 2022-05-18T04:22:35.8558398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:35.8559466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:35.8628010Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:37.1906632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:37.1907174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:37.6501992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:37.6502778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:37.6534494Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:37.6535103Z warnings.warn( 2022-05-18T04:22:37.6535858Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:37.6536401Z warnings.warn( 2022-05-18T04:22:38.4234048Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:38.4234754Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:38.4235701Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:38.4236366Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:38.6785483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:38.6785992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:39.7163510Z ok (4.835s) 2022-05-18T04:22:39.7296680Z test_delayed_optim_step_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32330 2022-05-18T04:22:39.7401729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32331 2022-05-18T04:22:40.6237712Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0h0gw15n 2022-05-18T04:22:40.6238541Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0h0gw15n/_remote_module_non_scriptable.py 2022-05-18T04:22:40.6278079Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb4x5tp7j 2022-05-18T04:22:40.6280880Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb4x5tp7j/_remote_module_non_scriptable.py 2022-05-18T04:22:40.6454041Z dist init r=1, world=2 2022-05-18T04:22:40.6458292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:40.6500443Z dist init r=0, world=2 2022-05-18T04:22:40.6504905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:40.6505878Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:40.6562284Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:41.9869697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:41.9870591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:42.4438682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:42.4448298Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:42.4471313Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:42.4471870Z warnings.warn( 2022-05-18T04:22:42.4481833Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:42.4482377Z warnings.warn( 2022-05-18T04:22:43.2287052Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:43.2287766Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:43.2288708Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:43.2289365Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:43.4837499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:43.4838002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:44.5514890Z ok (4.835s) 2022-05-18T04:22:44.5649459Z test_delayed_optim_step_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32413 2022-05-18T04:22:44.5753113Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32414 2022-05-18T04:22:45.4734298Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpka86zs37 2022-05-18T04:22:45.4735467Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpka86zs37/_remote_module_non_scriptable.py 2022-05-18T04:22:45.4918580Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfxo9v1d1 2022-05-18T04:22:45.4921345Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfxo9v1d1/_remote_module_non_scriptable.py 2022-05-18T04:22:45.4959218Z dist init r=1, world=2 2022-05-18T04:22:45.4963834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:45.5136871Z dist init r=0, world=2 2022-05-18T04:22:45.5141457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:45.5142268Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:45.5169164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:46.8453808Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:46.8454521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:47.2975766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:47.2985314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:47.3009106Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:47.3009985Z warnings.warn( 2022-05-18T04:22:47.3019094Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:47.3019786Z warnings.warn( 2022-05-18T04:22:48.0831816Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:48.0832530Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:48.0833475Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:48.0834137Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:48.3387617Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:48.3388123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:49.3864296Z ok (4.835s) 2022-05-18T04:22:49.3997819Z test_delayed_optim_step_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32496 2022-05-18T04:22:49.4102735Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32497 2022-05-18T04:22:50.3103511Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3yhi_k__ 2022-05-18T04:22:50.3104870Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3yhi_k__/_remote_module_non_scriptable.py 2022-05-18T04:22:50.3148145Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzzy4lwu3 2022-05-18T04:22:50.3150971Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzzy4lwu3/_remote_module_non_scriptable.py 2022-05-18T04:22:50.3321339Z dist init r=0, world=2 2022-05-18T04:22:50.3325765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:50.3371607Z dist init r=1, world=2 2022-05-18T04:22:50.3376029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:50.3377058Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:50.3429491Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:51.6783452Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:51.6784376Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:52.1377478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:52.1387724Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:52.1410001Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:52.1410557Z warnings.warn( 2022-05-18T04:22:52.1421804Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:52.1422593Z warnings.warn( 2022-05-18T04:22:52.3965917Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3967223Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3968498Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3969752Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3974746Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3976013Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3977275Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.3978678Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:52.6445234Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:52.6445754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:53.4368185Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:53.4368894Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:53.4371891Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:53.4372568Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:53.6924718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:53.6925411Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:53.9502726Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:53.9504300Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:53.9505584Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:53.9506857Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:54.8227821Z ok (5.436s) 2022-05-18T04:22:54.8361039Z test_delayed_optim_step_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32579 2022-05-18T04:22:54.8466388Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32580 2022-05-18T04:22:55.7359459Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0s_inr2p 2022-05-18T04:22:55.7360648Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0s_inr2p/_remote_module_non_scriptable.py 2022-05-18T04:22:55.7573889Z dist init r=0, world=2 2022-05-18T04:22:55.7578207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:22:55.7660884Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpah18u4ek 2022-05-18T04:22:55.7663707Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpah18u4ek/_remote_module_non_scriptable.py 2022-05-18T04:22:55.7877388Z dist init r=1, world=2 2022-05-18T04:22:55.7881941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:22:55.7882765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:55.7885061Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:22:57.1198006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:57.1198640Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:57.5722357Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:57.5731516Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:57.5754789Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:57.5755369Z warnings.warn( 2022-05-18T04:22:57.5765562Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:22:57.5766319Z warnings.warn( 2022-05-18T04:22:57.8307402Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8308752Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8310032Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8311310Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8312567Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8313900Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8315162Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:57.8316626Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:22:58.0793110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:58.0793631Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:58.8813795Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:58.8814573Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:58.8815523Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:22:58.8816187Z warnings.warn(msg, FutureWarning) 2022-05-18T04:22:59.1366295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:59.1366794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:22:59.6511490Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:22:59.6512307Z return iter(self.unbind(0)) 2022-05-18T04:22:59.6513463Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:22:59.6514248Z return iter(self.unbind(0)) 2022-05-18T04:23:00.2590265Z ok (5.436s) 2022-05-18T04:23:00.2724630Z test_delayed_optim_step_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32662 2022-05-18T04:23:00.2832735Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32663 2022-05-18T04:23:01.2351736Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_s27avso 2022-05-18T04:23:01.2352756Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_s27avso/_remote_module_non_scriptable.py 2022-05-18T04:23:01.2363644Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi7h449wv 2022-05-18T04:23:01.2366440Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi7h449wv/_remote_module_non_scriptable.py 2022-05-18T04:23:01.2575391Z dist init r=1, world=2 2022-05-18T04:23:01.2579742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:01.2591492Z dist init r=0, world=2 2022-05-18T04:23:01.2596160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:01.2597129Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:01.2683436Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:02.6286382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:02.6286956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:03.0826237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:03.0826770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:03.0859029Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:03.0859610Z warnings.warn( 2022-05-18T04:23:03.0860347Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:03.0860893Z warnings.warn( 2022-05-18T04:23:03.3001626Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3002940Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3004487Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3005754Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3007019Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3008283Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3009528Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.3010782Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:03.4830260Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:03.4830781Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:03.9943268Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:03.9944293Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:03.9945269Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:03.9945937Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:04.1473385Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:04.1473904Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:04.4569304Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:04.4570367Z return iter(self.unbind(0)) 2022-05-18T04:23:04.4571515Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:04.4572288Z return iter(self.unbind(0)) 2022-05-18T04:23:04.9945176Z ok (4.735s) 2022-05-18T04:23:05.0083316Z test_delayed_optim_step_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32745 2022-05-18T04:23:05.0190945Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32746 2022-05-18T04:23:05.9127679Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbd4nl7_u 2022-05-18T04:23:05.9128753Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbd4nl7_u/_remote_module_non_scriptable.py 2022-05-18T04:23:05.9167724Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjl71cxqs 2022-05-18T04:23:05.9170583Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjl71cxqs/_remote_module_non_scriptable.py 2022-05-18T04:23:05.9342357Z dist init r=0, world=2 2022-05-18T04:23:05.9346788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:05.9393232Z dist init r=1, world=2 2022-05-18T04:23:05.9397758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:05.9398660Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:05.9450265Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:07.2759203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:07.2759735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:07.7306195Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:07.7317435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:07.7339607Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:07.7340252Z warnings.warn( 2022-05-18T04:23:07.7351651Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:07.7352227Z warnings.warn( 2022-05-18T04:23:07.9898049Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9899390Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9900651Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9902142Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9907042Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9908319Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9909605Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:07.9910875Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:08.2377902Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:08.2378472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:09.0302864Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:09.0303841Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:09.0308860Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:09.0309563Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:09.2861631Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:09.2862142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:09.5441189Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:09.5442490Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:09.5443775Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:09.5445183Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:10.3314845Z ok (5.337s) 2022-05-18T04:23:10.3448939Z test_delayed_optim_step_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32828 2022-05-18T04:23:10.3555013Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32829 2022-05-18T04:23:11.2529513Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy4tpph61 2022-05-18T04:23:11.2530437Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy4tpph61/_remote_module_non_scriptable.py 2022-05-18T04:23:11.2745139Z dist init r=0, world=2 2022-05-18T04:23:11.2749661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:11.2976648Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkhc6m6dy 2022-05-18T04:23:11.2979352Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkhc6m6dy/_remote_module_non_scriptable.py 2022-05-18T04:23:11.3199453Z dist init r=1, world=2 2022-05-18T04:23:11.3203872Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:11.3204788Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:11.3259686Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:12.6406967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:12.6407538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:13.0952024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:13.0961871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:13.0987476Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:13.0988814Z warnings.warn( 2022-05-18T04:23:13.0996524Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:13.0997752Z warnings.warn( 2022-05-18T04:23:13.3540914Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3543079Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3544636Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3546124Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3547399Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3548662Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3549920Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.3551504Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:13.6025839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:13.6026778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:14.4057838Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:14.4059227Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:14.4061326Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:14.4062618Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:14.6611968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:14.6612908Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:15.1751173Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:15.1751992Z return iter(self.unbind(0)) 2022-05-18T04:23:15.1753130Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:15.1754078Z return iter(self.unbind(0)) 2022-05-18T04:23:15.7678476Z ok (5.436s) 2022-05-18T04:23:15.7811877Z test_delayed_optim_step_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32911 2022-05-18T04:23:15.7916621Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32912 2022-05-18T04:23:16.6859347Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqs21rd13 2022-05-18T04:23:16.6860454Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqs21rd13/_remote_module_non_scriptable.py 2022-05-18T04:23:16.6956162Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4wpnofjo 2022-05-18T04:23:16.6958940Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4wpnofjo/_remote_module_non_scriptable.py 2022-05-18T04:23:16.7081093Z dist init r=0, world=2 2022-05-18T04:23:16.7085398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:16.7175415Z dist init r=1, world=2 2022-05-18T04:23:16.7179346Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:16.7180490Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:16.7188973Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:18.0517635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:18.0518175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:18.5043037Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:18.5044014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:18.5077682Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:18.5078941Z warnings.warn( 2022-05-18T04:23:18.5080673Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:18.5082219Z warnings.warn( 2022-05-18T04:23:18.7622294Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7624937Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7626275Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7627551Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7629003Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7630272Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7631536Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:18.7633085Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:19.0107614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:19.0108595Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:19.8120756Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:19.8122184Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:19.8123987Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:19.8125242Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:20.0672486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:20.0673675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:20.5807406Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:20.5808239Z return iter(self.unbind(0)) 2022-05-18T04:23:20.5809373Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:20.5810159Z return iter(self.unbind(0)) 2022-05-18T04:23:21.2039911Z ok (5.436s) 2022-05-18T04:23:21.2173333Z test_delayed_optim_step_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32994 2022-05-18T04:23:21.2279339Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32995 2022-05-18T04:23:22.2024156Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa5kx02fg 2022-05-18T04:23:22.2025614Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa5kx02fg/_remote_module_non_scriptable.py 2022-05-18T04:23:22.2180268Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprt5hmf0y 2022-05-18T04:23:22.2182949Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprt5hmf0y/_remote_module_non_scriptable.py 2022-05-18T04:23:22.2240003Z dist init r=1, world=2 2022-05-18T04:23:22.2244059Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:22.2397666Z dist init r=0, world=2 2022-05-18T04:23:22.2401763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:22.2402587Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:22.2449400Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:23.5790253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:23.5790801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:24.0355557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:24.0364445Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:24.0387500Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:24.0388061Z warnings.warn( 2022-05-18T04:23:24.0397741Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:24.0398315Z warnings.warn( 2022-05-18T04:23:24.2949556Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2951122Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2952405Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2953690Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2954964Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2956232Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2957610Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.2958870Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:24.5420770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:24.5421288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:25.3345506Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:25.3346183Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:25.3347136Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:25.3347805Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:25.5898527Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:25.5899055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:25.8475302Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:25.8476759Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:25.8478061Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:25.8479337Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:26.7403294Z ok (5.536s) 2022-05-18T04:23:26.7539870Z test_delayed_optim_step_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33077 2022-05-18T04:23:26.7644903Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33078 2022-05-18T04:23:27.6627813Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphruv5nd7 2022-05-18T04:23:27.6628864Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphruv5nd7/_remote_module_non_scriptable.py 2022-05-18T04:23:27.6772081Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2ty0bw4z 2022-05-18T04:23:27.6774907Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2ty0bw4z/_remote_module_non_scriptable.py 2022-05-18T04:23:27.6841733Z dist init r=1, world=2 2022-05-18T04:23:27.6845854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:27.6996626Z dist init r=0, world=2 2022-05-18T04:23:27.7001077Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:27.7001865Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:27.7051086Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:29.0350080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:29.0350781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:29.4876436Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:29.4877213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:29.4910652Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:29.4911504Z warnings.warn( 2022-05-18T04:23:29.4912284Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:29.4912831Z warnings.warn( 2022-05-18T04:23:29.7455664Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7457418Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7458738Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7460005Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7461276Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7462547Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7464150Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.7465445Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:29.9940390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:29.9940899Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:30.7975278Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:30.7976181Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:30.7978064Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:30.7978860Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:31.0532141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:31.0532671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:31.5679802Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:31.5680862Z return iter(self.unbind(0)) 2022-05-18T04:23:31.5682040Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:31.5682829Z return iter(self.unbind(0)) 2022-05-18T04:23:32.1768508Z ok (5.436s) 2022-05-18T04:23:32.1902011Z test_delayed_optim_step_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33160 2022-05-18T04:23:32.2007287Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33161 2022-05-18T04:23:33.1223969Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxg0ei9ox 2022-05-18T04:23:33.1225036Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxg0ei9ox/_remote_module_non_scriptable.py 2022-05-18T04:23:33.1438788Z dist init r=1, world=2 2022-05-18T04:23:33.1442878Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:33.1509205Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9nrpw9jq 2022-05-18T04:23:33.1511811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9nrpw9jq/_remote_module_non_scriptable.py 2022-05-18T04:23:33.1735030Z dist init r=0, world=2 2022-05-18T04:23:33.1739427Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:33.1740253Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:33.1749421Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:34.5297505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:34.5298028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:34.9882731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:34.9892542Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:34.9915803Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:34.9916455Z warnings.warn( 2022-05-18T04:23:34.9928470Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:34.9929009Z warnings.warn( 2022-05-18T04:23:35.2474695Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2476000Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2477517Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2478810Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2480077Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2481337Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2482594Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.2483983Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:23:35.4961945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:35.4962460Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:36.2997944Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:36.2998703Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:36.2999645Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:36.3000304Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:36.5551609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:36.5552143Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:37.0695115Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:37.0695938Z return iter(self.unbind(0)) 2022-05-18T04:23:37.0697250Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:23:37.0698040Z return iter(self.unbind(0)) 2022-05-18T04:23:37.7132334Z ok (5.536s) 2022-05-18T04:23:37.7266023Z test_delayed_reduce_scatter_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33243 2022-05-18T04:23:37.7371575Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33244 2022-05-18T04:23:38.6690343Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfy7h_rpd 2022-05-18T04:23:38.6691355Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfy7h_rpd/_remote_module_non_scriptable.py 2022-05-18T04:23:38.6718959Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp0fxdf8b 2022-05-18T04:23:38.6722079Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp0fxdf8b/_remote_module_non_scriptable.py 2022-05-18T04:23:38.6909320Z dist init r=1, world=2 2022-05-18T04:23:38.6913731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:38.6945185Z dist init r=0, world=2 2022-05-18T04:23:38.6949834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:38.6951015Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:38.7017486Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:40.0304502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:40.0305062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:40.2317487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:40.2322718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:40.2351349Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:40.2352493Z warnings.warn( 2022-05-18T04:23:40.2357287Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:40.2358398Z warnings.warn( 2022-05-18T04:23:40.2702632Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:40.2704292Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:40.2708143Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:40.2709443Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:40.2797373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:40.2798325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:40.6448172Z ok (2.931s) 2022-05-18T04:23:40.6582132Z test_delayed_reduce_scatter_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33326 2022-05-18T04:23:40.6688296Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33327 2022-05-18T04:23:41.5384010Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe_e228_s 2022-05-18T04:23:41.5385206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe_e228_s/_remote_module_non_scriptable.py 2022-05-18T04:23:41.5598743Z dist init r=1, world=2 2022-05-18T04:23:41.5602723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:41.5947658Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkn2s7c9v 2022-05-18T04:23:41.5950354Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkn2s7c9v/_remote_module_non_scriptable.py 2022-05-18T04:23:41.6164040Z dist init r=0, world=2 2022-05-18T04:23:41.6168464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:41.6169279Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:41.6214781Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:42.9301074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:42.9301626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:43.1322624Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:43.1323171Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:43.1355148Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:43.1355729Z warnings.warn( 2022-05-18T04:23:43.1356519Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:43.1357059Z warnings.warn( 2022-05-18T04:23:45.1703418Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:45.1704340Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:45.1705294Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:45.1705920Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:45.1791656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:45.1792186Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:47.4839316Z ok (6.839s) 2022-05-18T04:23:47.4975267Z test_delayed_reduce_scatter_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33409 2022-05-18T04:23:47.5080811Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33410 2022-05-18T04:23:48.4004906Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpob4hxg5e 2022-05-18T04:23:48.4006129Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpob4hxg5e/_remote_module_non_scriptable.py 2022-05-18T04:23:48.4071400Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpij7_yok7 2022-05-18T04:23:48.4074167Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpij7_yok7/_remote_module_non_scriptable.py 2022-05-18T04:23:48.4230397Z dist init r=1, world=2 2022-05-18T04:23:48.4234940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:48.4287054Z dist init r=0, world=2 2022-05-18T04:23:48.4291054Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:48.4292056Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:48.4338538Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:49.7710361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:49.7710903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:49.9796531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:49.9797046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:49.9829094Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:49.9829964Z warnings.warn( 2022-05-18T04:23:49.9830733Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:49.9831274Z warnings.warn( 2022-05-18T04:23:51.4251821Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:51.4252549Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:51.4253475Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:51.4254134Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:51.4342984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:51.4343862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:52.9203664Z ok (5.436s) 2022-05-18T04:23:52.9335845Z test_delayed_reduce_scatter_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33492 2022-05-18T04:23:52.9440226Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33493 2022-05-18T04:23:53.8777658Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw3nksv78 2022-05-18T04:23:53.8778715Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw3nksv78/_remote_module_non_scriptable.py 2022-05-18T04:23:53.8896479Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpds3mdpbs 2022-05-18T04:23:53.8899335Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpds3mdpbs/_remote_module_non_scriptable.py 2022-05-18T04:23:53.8995839Z dist init r=0, world=2 2022-05-18T04:23:53.8999850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:53.9120285Z dist init r=1, world=2 2022-05-18T04:23:53.9124382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:53.9125718Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:53.9205078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:55.2466613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:55.2467142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:55.4499787Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:55.4510151Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:55.4531917Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:55.4532487Z warnings.warn( 2022-05-18T04:23:55.4543667Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:55.4544220Z warnings.warn( 2022-05-18T04:23:55.4897374Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:55.4909862Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:55.4911393Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:23:55.4912111Z warnings.warn(msg, FutureWarning) 2022-05-18T04:23:55.4998427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:55.5000193Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:55.8516530Z ok (2.931s) 2022-05-18T04:23:55.8649183Z test_delayed_reduce_scatter_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33575 2022-05-18T04:23:55.8753772Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33576 2022-05-18T04:23:56.7625067Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgyhkep6p 2022-05-18T04:23:56.7626456Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgyhkep6p/_remote_module_non_scriptable.py 2022-05-18T04:23:56.7841741Z dist init r=1, world=2 2022-05-18T04:23:56.7845986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:23:56.8105780Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp86e4h19 2022-05-18T04:23:56.8108804Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp86e4h19/_remote_module_non_scriptable.py 2022-05-18T04:23:56.8327021Z dist init r=0, world=2 2022-05-18T04:23:56.8331443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:23:56.8332451Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:56.8355726Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:23:58.1735481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:58.1736331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:58.3692456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:58.3700660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:23:58.3724824Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:58.3725400Z warnings.warn( 2022-05-18T04:23:58.3733750Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:23:58.3734298Z warnings.warn( 2022-05-18T04:24:00.4097092Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:00.4097827Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:00.4099126Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:00.4099851Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:00.4193709Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:00.4194217Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:02.7905811Z ok (6.939s) 2022-05-18T04:24:02.8039783Z test_delayed_reduce_scatter_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33658 2022-05-18T04:24:02.8145801Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33659 2022-05-18T04:24:03.7156891Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprnpzye_s 2022-05-18T04:24:03.7157545Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprnpzye_s/_remote_module_non_scriptable.py 2022-05-18T04:24:03.7202321Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpedyxlit3 2022-05-18T04:24:03.7204906Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpedyxlit3/_remote_module_non_scriptable.py 2022-05-18T04:24:03.7371717Z dist init r=0, world=2 2022-05-18T04:24:03.7376065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:03.7428017Z dist init r=1, world=2 2022-05-18T04:24:03.7432385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:03.7433324Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:03.7479475Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:05.0751159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:05.0751691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:05.2679892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:05.2685206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:05.2713137Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:05.2713729Z warnings.warn( 2022-05-18T04:24:05.2717894Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:05.2718454Z warnings.warn( 2022-05-18T04:24:07.3061355Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:07.3062068Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:07.3063009Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:07.3063880Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:07.3152705Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:07.3153525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:09.6295118Z ok (6.839s) 2022-05-18T04:24:09.6428007Z test_delayed_reduce_scatter_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33741 2022-05-18T04:24:09.6533805Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33742 2022-05-18T04:24:10.5633149Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyccbs93f 2022-05-18T04:24:10.5634050Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyccbs93f/_remote_module_non_scriptable.py 2022-05-18T04:24:10.5850695Z dist init r=1, world=2 2022-05-18T04:24:10.5854838Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:10.6119009Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy_irh4gf 2022-05-18T04:24:10.6121833Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy_irh4gf/_remote_module_non_scriptable.py 2022-05-18T04:24:10.6342106Z dist init r=0, world=2 2022-05-18T04:24:10.6347078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:10.6347882Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:10.6364614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:11.9764659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:11.9765200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:12.1709239Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:12.1709783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:12.1741080Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:12.1741646Z warnings.warn( 2022-05-18T04:24:12.1743492Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:12.1744375Z warnings.warn( 2022-05-18T04:24:12.2092917Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:12.2093624Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:12.2098205Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:12.2098878Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:12.2189433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:12.2189911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:12.5609753Z ok (2.931s) 2022-05-18T04:24:12.5739857Z test_delayed_reduce_scatter_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33824 2022-05-18T04:24:12.5845523Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33825 2022-05-18T04:24:13.4518180Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp43rtmvcd 2022-05-18T04:24:13.4519068Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp43rtmvcd/_remote_module_non_scriptable.py 2022-05-18T04:24:13.4666410Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp52bqlrpg 2022-05-18T04:24:13.4669320Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp52bqlrpg/_remote_module_non_scriptable.py 2022-05-18T04:24:13.4733849Z dist init r=0, world=2 2022-05-18T04:24:13.4737989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:13.4891493Z dist init r=1, world=2 2022-05-18T04:24:13.4895917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:13.4896727Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:13.4943323Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:14.8299352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:14.8299861Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:15.0308439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:15.0318222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:15.0340142Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:15.0340726Z warnings.warn( 2022-05-18T04:24:15.0352140Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:15.0352671Z warnings.warn( 2022-05-18T04:24:17.0718846Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:17.0719595Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:17.0721095Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:17.0721780Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:17.0812219Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:17.0814383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:19.3998548Z ok (6.839s) 2022-05-18T04:24:19.4133611Z test_delayed_reduce_scatter_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33907 2022-05-18T04:24:19.4239396Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33908 2022-05-18T04:24:20.3523921Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0anlnku8 2022-05-18T04:24:20.3525129Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0anlnku8/_remote_module_non_scriptable.py 2022-05-18T04:24:20.3688733Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppvl87rsh 2022-05-18T04:24:20.3691458Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppvl87rsh/_remote_module_non_scriptable.py 2022-05-18T04:24:20.3737465Z dist init r=0, world=2 2022-05-18T04:24:20.3741547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:20.3905094Z dist init r=1, world=2 2022-05-18T04:24:20.3909522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:20.3910355Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:20.3947144Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:21.7178028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:21.7178540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:21.9171727Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:21.9172254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:21.9203927Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:21.9204512Z warnings.warn( 2022-05-18T04:24:21.9205268Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:21.9205820Z warnings.warn( 2022-05-18T04:24:23.9569767Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:23.9570646Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:23.9571843Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:23.9572919Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:23.9659775Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:23.9660289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:26.3396559Z ok (6.940s) 2022-05-18T04:24:26.3529528Z test_delayed_reduce_scatter_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33990 2022-05-18T04:24:26.3635145Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33991 2022-05-18T04:24:27.2628125Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0hc3oob6 2022-05-18T04:24:27.2629281Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0hc3oob6/_remote_module_non_scriptable.py 2022-05-18T04:24:27.2855022Z dist init r=1, world=2 2022-05-18T04:24:27.2859686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:27.3049325Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvkfv30gb 2022-05-18T04:24:27.3051794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvkfv30gb/_remote_module_non_scriptable.py 2022-05-18T04:24:27.3266980Z dist init r=0, world=2 2022-05-18T04:24:27.3271111Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:27.3271940Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:27.3370144Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:28.6754676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:28.6755676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:28.8761702Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:28.8762739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:28.8794062Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:28.8795237Z warnings.warn( 2022-05-18T04:24:28.8796746Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:28.8797789Z warnings.warn( 2022-05-18T04:24:28.8908128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:28.8911542Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:28.8916323Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8918921Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8920996Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8922316Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8923601Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8924871Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8926129Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.8927506Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.9428155Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:28.9429448Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:28.9443033Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:28.9444477Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:28.9536636Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:28.9539600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:28.9661787Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.9664646Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.9667152Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:28.9669853Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:29.2714554Z ok (2.932s) 2022-05-18T04:24:29.2853873Z test_delayed_reduce_scatter_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34073 2022-05-18T04:24:29.2962146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34074 2022-05-18T04:24:30.1972070Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvz5i1vl7 2022-05-18T04:24:30.1973449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvz5i1vl7/_remote_module_non_scriptable.py 2022-05-18T04:24:30.2186062Z dist init r=0, world=2 2022-05-18T04:24:30.2190187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:30.2553085Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjvefdrfo 2022-05-18T04:24:30.2555879Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjvefdrfo/_remote_module_non_scriptable.py 2022-05-18T04:24:30.2780798Z dist init r=1, world=2 2022-05-18T04:24:30.2785132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:30.2786713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:30.2801421Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:31.6132762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:31.6133361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:31.8106845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:31.8116162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:31.8138165Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:31.8138753Z warnings.warn( 2022-05-18T04:24:31.8150051Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:31.8150590Z warnings.warn( 2022-05-18T04:24:31.8263938Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:31.8264749Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:31.8265904Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:31.8266688Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:31.8270656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:31.8273117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:33.8781669Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:33.8782581Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:33.8783907Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:33.8784596Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:33.8875765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:33.8877033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:33.9008484Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:33.9010249Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:33.9011525Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:33.9012832Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:36.2126768Z ok (6.941s) 2022-05-18T04:24:36.2263409Z test_delayed_reduce_scatter_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34156 2022-05-18T04:24:36.2371689Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34157 2022-05-18T04:24:37.1439828Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5xv2iaon 2022-05-18T04:24:37.1440615Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5xv2iaon/_remote_module_non_scriptable.py 2022-05-18T04:24:37.1662557Z dist init r=1, world=2 2022-05-18T04:24:37.1667259Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:37.1908450Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1reoudnl 2022-05-18T04:24:37.1911026Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1reoudnl/_remote_module_non_scriptable.py 2022-05-18T04:24:37.2125540Z dist init r=0, world=2 2022-05-18T04:24:37.2129868Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:37.2130910Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:37.2177983Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:38.5641039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:38.5641673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:38.7693046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:38.7702085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:38.7726171Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:38.7726722Z warnings.warn( 2022-05-18T04:24:38.7735359Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:38.7735894Z warnings.warn( 2022-05-18T04:24:38.7850141Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:38.7851215Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:38.7852357Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:38.7853114Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:38.7857986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:38.7858492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:40.2452142Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:40.2452892Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:40.2453990Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:40.2454656Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:40.2546667Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:40.2547498Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:40.2679177Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:40.2680506Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:40.2682076Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:40.2683363Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:41.7499679Z ok (5.537s) 2022-05-18T04:24:41.7634750Z test_delayed_reduce_scatter_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34239 2022-05-18T04:24:41.7740147Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34240 2022-05-18T04:24:42.7081501Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprae5b0j0 2022-05-18T04:24:42.7082848Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprae5b0j0/_remote_module_non_scriptable.py 2022-05-18T04:24:42.7170444Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp3twfya6 2022-05-18T04:24:42.7173221Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp3twfya6/_remote_module_non_scriptable.py 2022-05-18T04:24:42.7298850Z dist init r=0, world=2 2022-05-18T04:24:42.7303085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:42.7398510Z dist init r=1, world=2 2022-05-18T04:24:42.7402696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:42.7403903Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:42.7406265Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:44.0864209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:44.0864715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:44.2855594Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:44.2856152Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:44.2887033Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:44.2887604Z warnings.warn( 2022-05-18T04:24:44.2888353Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:44.2888897Z warnings.warn( 2022-05-18T04:24:44.2999305Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:44.3001567Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:44.3006164Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3008217Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3009550Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3010828Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3012147Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3013408Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3014778Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3016034Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3522245Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:44.3522928Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:44.3531492Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:44.3532173Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:44.3623639Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:44.3625432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:44.3745574Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3746894Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3748353Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.3749649Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:44.6818290Z ok (2.932s) 2022-05-18T04:24:44.6960377Z test_delayed_reduce_scatter_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34322 2022-05-18T04:24:44.7064498Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34323 2022-05-18T04:24:45.6006413Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppq8jy2c2 2022-05-18T04:24:45.6007615Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppq8jy2c2/_remote_module_non_scriptable.py 2022-05-18T04:24:45.6014793Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphk06eo6h 2022-05-18T04:24:45.6017510Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphk06eo6h/_remote_module_non_scriptable.py 2022-05-18T04:24:45.6229422Z dist init r=1, world=2 2022-05-18T04:24:45.6231124Z dist init r=0, world=2 2022-05-18T04:24:45.6233914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:45.6234815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:45.6235836Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:45.6236787Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:46.9606591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:46.9607114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:47.1579237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:47.1589291Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:47.1611907Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:47.1612722Z warnings.warn( 2022-05-18T04:24:47.1623001Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:47.1623933Z warnings.warn( 2022-05-18T04:24:47.1738183Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:47.1738995Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:47.1740424Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:47.1741211Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:47.1744592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:47.1746933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:49.2247020Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:49.2247750Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:49.2249399Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:49.2250190Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:49.2340866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:49.2343111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:49.2473695Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:49.2475153Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:49.2476432Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:49.2477702Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:51.6229636Z ok (6.941s) 2022-05-18T04:24:51.6360901Z test_delayed_reduce_scatter_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34405 2022-05-18T04:24:51.6466662Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34406 2022-05-18T04:24:52.5386277Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp5lnihvk 2022-05-18T04:24:52.5387190Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp5lnihvk/_remote_module_non_scriptable.py 2022-05-18T04:24:52.5610088Z dist init r=1, world=2 2022-05-18T04:24:52.5614501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:52.5711311Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb6ialh00 2022-05-18T04:24:52.5714028Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb6ialh00/_remote_module_non_scriptable.py 2022-05-18T04:24:52.5928405Z dist init r=0, world=2 2022-05-18T04:24:52.5932320Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:52.5933626Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:52.6023098Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:53.9278514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:53.9279068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:54.1263767Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:54.1264539Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:54.1295450Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:54.1296036Z warnings.warn( 2022-05-18T04:24:54.1296795Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:24:54.1297340Z warnings.warn( 2022-05-18T04:24:54.1406475Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:54.1407631Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:54.1408791Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:24:54.1409560Z param.add_(d_p, alpha=alpha) 2022-05-18T04:24:54.1413422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:54.1413942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:56.1895213Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:56.1896009Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:56.1896976Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:24:56.1897639Z warnings.warn(msg, FutureWarning) 2022-05-18T04:24:56.1984302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:56.1984797Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:24:56.2112303Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:56.2113879Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:56.2115174Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:56.2116450Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:24:58.5630006Z ok (6.940s) 2022-05-18T04:24:58.5763013Z test_delayed_reduce_scatter_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34488 2022-05-18T04:24:58.5870170Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34489 2022-05-18T04:24:59.4814835Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7lmpublx 2022-05-18T04:24:59.4816167Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7lmpublx/_remote_module_non_scriptable.py 2022-05-18T04:24:59.5030603Z dist init r=0, world=2 2022-05-18T04:24:59.5034726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:24:59.5315829Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppjny_ye9 2022-05-18T04:24:59.5318793Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppjny_ye9/_remote_module_non_scriptable.py 2022-05-18T04:24:59.5540341Z dist init r=1, world=2 2022-05-18T04:24:59.5545079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:24:59.5546233Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:24:59.5646771Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:00.8821929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:00.8822443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:01.0810830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:01.0811407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:01.0842493Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:01.0843070Z warnings.warn( 2022-05-18T04:25:01.0843832Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:01.0844368Z warnings.warn( 2022-05-18T04:25:01.0949535Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:01.0950033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:01.0957904Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0960399Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0962827Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0965259Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0967750Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0969405Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0970689Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.0971949Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.1464911Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:01.1465639Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:01.1470563Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:01.1471258Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:01.1558604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:01.1559154Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:01.1674294Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.1675832Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.1677137Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.1678417Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:01.4953946Z ok (2.932s) 2022-05-18T04:25:01.5086566Z test_delayed_reduce_scatter_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34571 2022-05-18T04:25:01.5192543Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34572 2022-05-18T04:25:02.4640544Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn6kq3p_s 2022-05-18T04:25:02.4641408Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpopnyh8fh 2022-05-18T04:25:02.4642197Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn6kq3p_s/_remote_module_non_scriptable.py 2022-05-18T04:25:02.4644316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpopnyh8fh/_remote_module_non_scriptable.py 2022-05-18T04:25:02.4864728Z dist init r=1, world=2 2022-05-18T04:25:02.4868601Z dist init r=0, world=2 2022-05-18T04:25:02.4869499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:02.4872634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:02.4873428Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:02.4973274Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:03.8385947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:03.8386481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:04.0437730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:04.0441428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:04.0470501Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:04.0471057Z warnings.warn( 2022-05-18T04:25:04.0475353Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:04.0475912Z warnings.warn( 2022-05-18T04:25:04.0589813Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:25:04.0590599Z param.add_(d_p, alpha=alpha) 2022-05-18T04:25:04.0592025Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:25:04.0592812Z param.add_(d_p, alpha=alpha) 2022-05-18T04:25:04.0597381Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:04.0597879Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:06.1099898Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:06.1100659Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:06.1101634Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:06.1102596Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:06.1192574Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:06.1193299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:06.1325576Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:06.1326910Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:06.1328167Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:06.1329438Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:08.4358950Z ok (6.940s) 2022-05-18T04:25:08.4489917Z test_delayed_reduce_scatter_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34654 2022-05-18T04:25:08.4598678Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34655 2022-05-18T04:25:09.3558756Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp66qsssc5 2022-05-18T04:25:09.3560250Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp66qsssc5/_remote_module_non_scriptable.py 2022-05-18T04:25:09.3783695Z dist init r=1, world=2 2022-05-18T04:25:09.3788187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:09.3932424Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn94osf5r 2022-05-18T04:25:09.3935224Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn94osf5r/_remote_module_non_scriptable.py 2022-05-18T04:25:09.4148133Z dist init r=0, world=2 2022-05-18T04:25:09.4152591Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:09.4153401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:09.4197132Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:10.7678466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:10.7678998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:10.9664778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:10.9665326Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:10.9696885Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:10.9697466Z warnings.warn( 2022-05-18T04:25:10.9698213Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:10.9699038Z warnings.warn( 2022-05-18T04:25:10.9807602Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:25:10.9808391Z param.add_(d_p, alpha=alpha) 2022-05-18T04:25:10.9809519Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:25:10.9810279Z param.add_(d_p, alpha=alpha) 2022-05-18T04:25:10.9814836Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:10.9815336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:13.0293667Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:13.0294373Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:13.0295306Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:13.0295975Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:13.0381802Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:13.0382284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:25:13.0510790Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:13.0512137Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:13.0513409Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:13.0514678Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:15.3758964Z ok (6.940s) 2022-05-18T04:25:15.3889469Z test_mixture_of_experts_offload_false_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34737 2022-05-18T04:25:15.3993959Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34738 2022-05-18T04:25:16.3110679Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9oyvewhw 2022-05-18T04:25:16.3111654Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9oyvewhw/_remote_module_non_scriptable.py 2022-05-18T04:25:16.3312980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwc6rczy5 2022-05-18T04:25:16.3315628Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwc6rczy5/_remote_module_non_scriptable.py 2022-05-18T04:25:16.3325128Z dist init r=1, world=2 2022-05-18T04:25:16.3329203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:16.3530652Z dist init r=0, world=2 2022-05-18T04:25:16.3534665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:16.3535642Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:16.3636421Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:17.6952339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:17.6952864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:17.8998780Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:17.8999385Z warnings.warn( 2022-05-18T04:25:17.9003567Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:17.9004117Z warnings.warn( 2022-05-18T04:25:17.9038932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:17.9046432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:17.9047576Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:17.9142058Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:17.9194194Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:17.9195504Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:17.9196772Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:17.9198035Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:17.9199437Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:17.9200704Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:17.9683168Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:17.9683907Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:17.9686651Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:17.9687326Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:17.9814033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:17.9822777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:17.9823869Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:17.9916971Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:18.4072873Z ok (3.031s) 2022-05-18T04:25:18.4205454Z test_mixture_of_experts_offload_false_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34840 2022-05-18T04:25:18.4310090Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34841 2022-05-18T04:25:19.3193824Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkye74aes 2022-05-18T04:25:19.3194913Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkye74aes/_remote_module_non_scriptable.py 2022-05-18T04:25:19.3419001Z dist init r=0, world=2 2022-05-18T04:25:19.3423253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:19.3988100Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaq0g3nym 2022-05-18T04:25:19.3990987Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaq0g3nym/_remote_module_non_scriptable.py 2022-05-18T04:25:19.4216534Z dist init r=1, world=2 2022-05-18T04:25:19.4221377Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:19.4222391Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:19.4237724Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:20.7898617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:20.7899197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:20.9916084Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:20.9917028Z warnings.warn( 2022-05-18T04:25:20.9950235Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:20.9950795Z warnings.warn( 2022-05-18T04:25:20.9975567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:20.9994409Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:20.9995223Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:21.0078729Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:21.0132325Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:21.0133621Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:21.0134876Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:21.0136142Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:21.0137538Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:21.0138817Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:21.0637350Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:21.0638064Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:21.0639006Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:21.0639633Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:21.0775943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:21.0776700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:21.0777413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:21.0778252Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:21.4388113Z ok (3.031s) 2022-05-18T04:25:21.4520749Z test_mixture_of_experts_offload_false_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34943 2022-05-18T04:25:21.4625436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34944 2022-05-18T04:25:22.3409307Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoxg2b9_g 2022-05-18T04:25:22.3410764Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoxg2b9_g/_remote_module_non_scriptable.py 2022-05-18T04:25:22.3600852Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5mik0h1t 2022-05-18T04:25:22.3603204Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5mik0h1t/_remote_module_non_scriptable.py 2022-05-18T04:25:22.3650836Z dist init r=1, world=2 2022-05-18T04:25:22.3655414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:22.3821263Z dist init r=0, world=2 2022-05-18T04:25:22.3825524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:22.3826742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:22.3860716Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:23.7136287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:23.7136806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:23.9066115Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:23.9066724Z warnings.warn( 2022-05-18T04:25:23.9110800Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:23.9111409Z warnings.warn( 2022-05-18T04:25:23.9134782Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:23.9152270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:23.9152979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:23.9237894Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:23.9293947Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:23.9295251Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:23.9296671Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:23.9297941Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:23.9299210Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:23.9300465Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:23.9575322Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:23.9576011Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:23.9577376Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:23.9578111Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:23.9707587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:23.9711986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:23.9712774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:23.9810662Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:24.2699567Z ok (2.831s) 2022-05-18T04:25:24.2832808Z test_mixture_of_experts_offload_false_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35030 2022-05-18T04:25:24.2941974Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35031 2022-05-18T04:25:25.1837308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptx9zzx7w 2022-05-18T04:25:25.1839120Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptx9zzx7w/_remote_module_non_scriptable.py 2022-05-18T04:25:25.2062141Z dist init r=1, world=2 2022-05-18T04:25:25.2066864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:25.2291983Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcd81do83 2022-05-18T04:25:25.2294731Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcd81do83/_remote_module_non_scriptable.py 2022-05-18T04:25:25.2512824Z dist init r=0, world=2 2022-05-18T04:25:25.2516975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:25.2518073Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:25.2577398Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:26.5884181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:26.5884728Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:26.7847567Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:26.7848192Z warnings.warn( 2022-05-18T04:25:26.7866457Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:26.7866999Z warnings.warn( 2022-05-18T04:25:26.7891107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:26.7909355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:26.7910316Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:26.7994161Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:26.8049902Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:26.8051257Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:26.8052737Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:26.8054031Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:26.8055298Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:26.8056559Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:26.8328486Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:26.8329309Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:26.8330473Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:26.8331116Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:26.8459507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:26.8462046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:26.8463024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:26.8562550Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:27.2018123Z ok (2.932s) 2022-05-18T04:25:27.2153434Z test_mixture_of_experts_offload_false_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35117 2022-05-18T04:25:27.2259622Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35118 2022-05-18T04:25:28.1287048Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph5xf28i5 2022-05-18T04:25:28.1288088Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph5xf28i5/_remote_module_non_scriptable.py 2022-05-18T04:25:28.1308876Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnyx45zbq 2022-05-18T04:25:28.1311938Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnyx45zbq/_remote_module_non_scriptable.py 2022-05-18T04:25:28.1504292Z dist init r=1, world=2 2022-05-18T04:25:28.1508998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:28.1534604Z dist init r=0, world=2 2022-05-18T04:25:28.1539032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:28.1539851Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:28.1612695Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:29.5159003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:29.5159556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:29.7209389Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:29.7210009Z warnings.warn( 2022-05-18T04:25:29.7216586Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:29.7217134Z warnings.warn( 2022-05-18T04:25:29.7250344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:29.7260839Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:29.7261515Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:29.7353689Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:29.7410980Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:29.7412338Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:29.7413592Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:29.7414867Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:29.7416136Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:29.7417391Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:29.7692578Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:29.7693250Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:29.7694871Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:29.7695537Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:29.7823800Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:29.7832890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:29.7833585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:29.7926477Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:30.1336642Z ok (2.932s) 2022-05-18T04:25:30.1470999Z test_mixture_of_experts_offload_false_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35204 2022-05-18T04:25:30.1578194Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35205 2022-05-18T04:25:31.0509472Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfcpfafw5 2022-05-18T04:25:31.0510715Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfcpfafw5/_remote_module_non_scriptable.py 2022-05-18T04:25:31.0710621Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu_rjaztm 2022-05-18T04:25:31.0713775Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu_rjaztm/_remote_module_non_scriptable.py 2022-05-18T04:25:31.0724233Z dist init r=1, world=2 2022-05-18T04:25:31.0728388Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:31.0937687Z dist init r=0, world=2 2022-05-18T04:25:31.0942107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:31.0943100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:31.1035533Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:32.4594351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:32.4594910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:32.6621151Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:32.6621753Z warnings.warn( 2022-05-18T04:25:32.6622554Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:32.6623099Z warnings.warn( 2022-05-18T04:25:32.6661441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:32.6665147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:32.6666178Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:32.6764460Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:32.6821661Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:32.6823209Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:32.6824918Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:32.6826199Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:32.6827463Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:32.6828844Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:32.7102786Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:32.7103477Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:32.7105083Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:32.7105761Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:32.7233823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:32.7242430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:32.7243103Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:32.7336740Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:33.0656116Z ok (2.932s) 2022-05-18T04:25:33.0790926Z test_mixture_of_experts_offload_false_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35291 2022-05-18T04:25:33.0897043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35292 2022-05-18T04:25:33.9946236Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp92slctqq 2022-05-18T04:25:33.9947268Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp92slctqq/_remote_module_non_scriptable.py 2022-05-18T04:25:33.9977287Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpehemh6sp 2022-05-18T04:25:33.9980489Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpehemh6sp/_remote_module_non_scriptable.py 2022-05-18T04:25:34.0162336Z dist init r=1, world=2 2022-05-18T04:25:34.0166954Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:34.0204809Z dist init r=0, world=2 2022-05-18T04:25:34.0209490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:34.0210282Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:34.0270764Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:35.3617799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:35.3618332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:35.5640295Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:35.5640899Z warnings.warn( 2022-05-18T04:25:35.5667781Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:35.5668839Z warnings.warn( 2022-05-18T04:25:35.5691369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:35.5712494Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:35.5713382Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:35.5794433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:35.5846693Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:35.5848274Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:35.5849812Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:35.5851305Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:35.5852760Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:35.5854358Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:35.6369556Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:35.6370451Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:35.6371427Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:35.6372265Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:35.6499742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:35.6509333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:35.6510043Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:35.6602775Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:35.9974767Z ok (2.932s) 2022-05-18T04:25:36.0109043Z test_mixture_of_experts_offload_false_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35394 2022-05-18T04:25:36.0217826Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35395 2022-05-18T04:25:36.9259622Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpodvs0u3i 2022-05-18T04:25:36.9260591Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpodvs0u3i/_remote_module_non_scriptable.py 2022-05-18T04:25:36.9473680Z dist init r=1, world=2 2022-05-18T04:25:36.9477623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:36.9512764Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq8jya8s8 2022-05-18T04:25:36.9515367Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq8jya8s8/_remote_module_non_scriptable.py 2022-05-18T04:25:36.9729086Z dist init r=0, world=2 2022-05-18T04:25:36.9733204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:36.9734010Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:36.9784585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:38.3170823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:38.3171372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:38.5151867Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:38.5152478Z warnings.warn( 2022-05-18T04:25:38.5180148Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:38.5180694Z warnings.warn( 2022-05-18T04:25:38.5204080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:38.5222453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:38.5223970Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:38.5307432Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:38.5359374Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:38.5360700Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:38.5361992Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:38.5363254Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:38.5364676Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:38.5365933Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:38.5839130Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:38.5839902Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:38.5840840Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:38.5841497Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:38.5968701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:38.5974275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:38.5975249Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:38.6071857Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:39.0296771Z ok (3.032s) 2022-05-18T04:25:39.0431884Z test_mixture_of_experts_offload_false_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35497 2022-05-18T04:25:39.0538361Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35498 2022-05-18T04:25:39.9390389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_067w73t 2022-05-18T04:25:39.9391748Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_067w73t/_remote_module_non_scriptable.py 2022-05-18T04:25:39.9541597Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk2u8vj2v 2022-05-18T04:25:39.9544230Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk2u8vj2v/_remote_module_non_scriptable.py 2022-05-18T04:25:39.9614355Z dist init r=1, world=2 2022-05-18T04:25:39.9618856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:39.9758903Z dist init r=0, world=2 2022-05-18T04:25:39.9763102Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:39.9763917Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:39.9824452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:41.3192597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:41.3193137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:41.5192202Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:41.5192802Z warnings.warn( 2022-05-18T04:25:41.5232096Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:41.5232628Z warnings.warn( 2022-05-18T04:25:41.5255394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:41.5276347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:41.5277057Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:41.5358331Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:41.5414381Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:41.5415685Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:41.5416959Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:41.5418233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:41.5419707Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:41.5420995Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:41.5698448Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:41.5699134Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:41.5700057Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:41.5700711Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:41.5830501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:41.5831015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:41.5831687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:41.5832791Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:41.9616121Z ok (2.932s) 2022-05-18T04:25:41.9750753Z test_mixture_of_experts_offload_false_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35584 2022-05-18T04:25:41.9856698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35585 2022-05-18T04:25:42.8967183Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6tcknkb3 2022-05-18T04:25:42.8968192Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6tcknkb3/_remote_module_non_scriptable.py 2022-05-18T04:25:42.9192411Z dist init r=1, world=2 2022-05-18T04:25:42.9196845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:42.9254131Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnc1tt518 2022-05-18T04:25:42.9256863Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnc1tt518/_remote_module_non_scriptable.py 2022-05-18T04:25:42.9470916Z dist init r=0, world=2 2022-05-18T04:25:42.9474920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:42.9475716Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:42.9503607Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:44.3070834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:44.5404305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:44.5405352Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:44.5405919Z warnings.warn( 2022-05-18T04:25:44.5416507Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:44.5417079Z warnings.warn( 2022-05-18T04:25:44.5445752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:44.5458516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:44.5459209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:44.5548974Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:44.5606099Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:44.5607384Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:44.5608805Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:44.5610076Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:44.5611376Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:44.5612642Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:44.5888855Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:44.5889540Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:44.5890936Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:44.5891583Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:44.6021356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:44.6021950Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:44.6022744Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:44.6023938Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:44.8935199Z ok (2.932s) 2022-05-18T04:25:44.9070155Z test_mixture_of_experts_offload_false_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35671 2022-05-18T04:25:44.9175810Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35672 2022-05-18T04:25:45.7879814Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ru555rd 2022-05-18T04:25:45.7880853Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ru555rd/_remote_module_non_scriptable.py 2022-05-18T04:25:45.8040359Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0ual6odd 2022-05-18T04:25:45.8043403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0ual6odd/_remote_module_non_scriptable.py 2022-05-18T04:25:45.8094491Z dist init r=1, world=2 2022-05-18T04:25:45.8098453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:45.8269388Z dist init r=0, world=2 2022-05-18T04:25:45.8274021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:45.8275099Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:45.8303406Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:47.1747874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:47.1748383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:47.3761333Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:47.3761953Z warnings.warn( 2022-05-18T04:25:47.3806941Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:47.3807532Z warnings.warn( 2022-05-18T04:25:47.3830540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:47.3852334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:47.3853023Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:47.3933858Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:47.3990638Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:47.3991940Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:47.3993472Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:47.3994782Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:47.3996052Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:47.3997319Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:47.4277280Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:47.4278199Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:47.4280911Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:47.4281563Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:47.4410356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:47.4421957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:47.4422675Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:47.4513371Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:47.8254601Z ok (2.932s) 2022-05-18T04:25:47.8388547Z test_mixture_of_experts_offload_false_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35758 2022-05-18T04:25:47.8496673Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35759 2022-05-18T04:25:48.7530156Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmyeg05_i 2022-05-18T04:25:48.7531475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmyeg05_i/_remote_module_non_scriptable.py 2022-05-18T04:25:48.7759092Z dist init r=1, world=2 2022-05-18T04:25:48.7763434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:48.7860574Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp57ukts1a 2022-05-18T04:25:48.7863236Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp57ukts1a/_remote_module_non_scriptable.py 2022-05-18T04:25:48.8075929Z dist init r=0, world=2 2022-05-18T04:25:48.8079802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:48.8080630Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:48.8172466Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:50.1602493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:50.1603036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:50.3645190Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:50.3645918Z warnings.warn( 2022-05-18T04:25:50.3658861Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:50.3659405Z warnings.warn( 2022-05-18T04:25:50.3685767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:50.3702978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:50.3704478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:50.3788585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:50.3844938Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:50.3846280Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:50.3847557Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:50.3848829Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:50.3850085Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:50.3851341Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:50.4126138Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:50.4127415Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:50.4128725Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:50.4129404Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:50.4256977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:50.4265275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:50.4266446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:50.4360448Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:50.7572816Z ok (2.932s) 2022-05-18T04:25:50.7707285Z test_mixture_of_experts_offload_false_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35845 2022-05-18T04:25:50.7811512Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35846 2022-05-18T04:25:51.7314407Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5950h47j 2022-05-18T04:25:51.7315199Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5950h47j/_remote_module_non_scriptable.py 2022-05-18T04:25:51.7318690Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgf82obne 2022-05-18T04:25:51.7321660Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgf82obne/_remote_module_non_scriptable.py 2022-05-18T04:25:51.7540968Z dist init r=1, world=2 2022-05-18T04:25:51.7545513Z dist init r=0, world=2 2022-05-18T04:25:51.7545918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:51.7549815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:51.7551084Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:51.7649685Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:53.1180067Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:53.1180577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:53.3189854Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:53.3190462Z warnings.warn( 2022-05-18T04:25:53.3224803Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:53.3225355Z warnings.warn( 2022-05-18T04:25:53.3248411Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:53.3269690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:53.3270487Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:53.3351595Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:53.3403945Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:53.3405287Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:53.3406565Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:53.3407838Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:53.3409100Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:53.3410477Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:53.3900405Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:53.3901124Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:53.3908970Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:53.3909660Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:53.4038559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:53.4049186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:53.4049875Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:53.4141603Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:53.7891444Z ok (3.032s) 2022-05-18T04:25:53.8022841Z test_mixture_of_experts_offload_false_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35948 2022-05-18T04:25:53.8127610Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35949 2022-05-18T04:25:54.7992294Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuntik4pa 2022-05-18T04:25:54.7993042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeo8xc5u4 2022-05-18T04:25:54.7993604Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuntik4pa/_remote_module_non_scriptable.py 2022-05-18T04:25:54.7996347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeo8xc5u4/_remote_module_non_scriptable.py 2022-05-18T04:25:54.8207518Z dist init r=1, world=2 2022-05-18T04:25:54.8211313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:54.8211700Z dist init r=0, world=2 2022-05-18T04:25:54.8215581Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:54.8216797Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:54.8314972Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:56.1611000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:56.1611557Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:56.3534358Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:56.3534969Z warnings.warn( 2022-05-18T04:25:56.3611229Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:56.3612132Z warnings.warn( 2022-05-18T04:25:56.3636443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:56.3652849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:56.3654164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:56.3739717Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:56.3792117Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:56.3793896Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:56.3795787Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:56.3797065Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:56.3798330Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:56.3799737Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:56.4278913Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:56.4279651Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:56.4280600Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:56.4281260Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:56.4410376Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:56.4412223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:56.4413754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:56.4513792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:56.8206358Z ok (3.031s) 2022-05-18T04:25:56.8339927Z test_mixture_of_experts_offload_false_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36051 2022-05-18T04:25:56.8445300Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36052 2022-05-18T04:25:57.7560708Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppbgrrzen 2022-05-18T04:25:57.7561938Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppbgrrzen/_remote_module_non_scriptable.py 2022-05-18T04:25:57.7782453Z dist init r=0, world=2 2022-05-18T04:25:57.7787389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:25:57.7933195Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpda2dj71u 2022-05-18T04:25:57.7936125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpda2dj71u/_remote_module_non_scriptable.py 2022-05-18T04:25:57.8161642Z dist init r=1, world=2 2022-05-18T04:25:57.8165793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:25:57.8166886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:57.8195825Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:25:59.1707714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:59.1708237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:59.3731068Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:59.3731662Z warnings.warn( 2022-05-18T04:25:59.3732429Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:25:59.3732962Z warnings.warn( 2022-05-18T04:25:59.3774153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:25:59.3777175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:25:59.3777917Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:59.3877134Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:25:59.3934679Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:59.3935973Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:59.3937249Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:59.3938660Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:59.3939910Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:59.3941162Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:25:59.4221663Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:59.4222341Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:59.4223280Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:25:59.4224216Z warnings.warn(msg, FutureWarning) 2022-05-18T04:25:59.4360410Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:25:59.4366385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:25:59.4367401Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:59.4463475Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:25:59.7521728Z ok (2.931s) 2022-05-18T04:25:59.7652840Z test_mixture_of_experts_offload_false_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36138 2022-05-18T04:25:59.7760493Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36139 2022-05-18T04:26:00.6745768Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdtd2rzr4 2022-05-18T04:26:00.6747021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdtd2rzr4/_remote_module_non_scriptable.py 2022-05-18T04:26:00.6967696Z dist init r=0, world=2 2022-05-18T04:26:00.6972061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:00.7202218Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm9ok0rr9 2022-05-18T04:26:00.7205255Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm9ok0rr9/_remote_module_non_scriptable.py 2022-05-18T04:26:00.7432467Z dist init r=1, world=2 2022-05-18T04:26:00.7437269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:00.7438091Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:00.7482831Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:02.0967012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:02.0967855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:02.2957675Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:02.2958267Z warnings.warn( 2022-05-18T04:26:02.2977585Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:02.2978117Z warnings.warn( 2022-05-18T04:26:02.3007066Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:02.3019809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:02.3020831Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:02.3110120Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:02.3167563Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:02.3168894Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:02.3170184Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:02.3171683Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:02.3172983Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:02.3174261Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:02.3457109Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:02.3457787Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:02.3458735Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:02.3459521Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:02.3594203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:02.3603448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:02.3604535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:02.3698072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:02.6835886Z ok (2.931s) 2022-05-18T04:26:02.6969259Z test_mixture_of_experts_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36225 2022-05-18T04:26:02.7074013Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36226 2022-05-18T04:26:03.6143126Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2m95s8hx 2022-05-18T04:26:03.6144214Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2m95s8hx/_remote_module_non_scriptable.py 2022-05-18T04:26:03.6358804Z dist init r=0, world=2 2022-05-18T04:26:03.6362881Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:03.6388292Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbon7sc6e 2022-05-18T04:26:03.6391314Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbon7sc6e/_remote_module_non_scriptable.py 2022-05-18T04:26:03.6612459Z dist init r=1, world=2 2022-05-18T04:26:03.6616778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:03.6617956Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:03.6670044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:05.0182714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:05.0183234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:05.2190377Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:05.2191003Z warnings.warn( 2022-05-18T04:26:05.2220757Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:05.2221302Z warnings.warn( 2022-05-18T04:26:05.2245546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:05.2264651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:05.2265877Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:05.2348909Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:05.2405669Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:05.2407012Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:05.2408533Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:05.2409787Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:05.2411091Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:05.2412353Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:05.2686386Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:05.2687250Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:05.2688233Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:05.2689057Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:05.2817894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:05.2826030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:05.2827066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:05.2920701Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:05.6162318Z ok (2.932s) 2022-05-18T04:26:05.6293201Z test_mixture_of_experts_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36312 2022-05-18T04:26:05.6397763Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36313 2022-05-18T04:26:06.5307530Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpchsqyf7g 2022-05-18T04:26:06.5308652Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpchsqyf7g/_remote_module_non_scriptable.py 2022-05-18T04:26:06.5335382Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpetbxmtgv 2022-05-18T04:26:06.5338602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpetbxmtgv/_remote_module_non_scriptable.py 2022-05-18T04:26:06.5522138Z dist init r=0, world=2 2022-05-18T04:26:06.5526297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:06.5563265Z dist init r=1, world=2 2022-05-18T04:26:06.5567946Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:06.5569160Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:06.5630357Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:07.8839927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:07.8840486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:08.0837061Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:08.0837675Z warnings.warn( 2022-05-18T04:26:08.0873297Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:08.0873837Z warnings.warn( 2022-05-18T04:26:08.0896782Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:08.0917753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:08.0918915Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:08.0999876Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:08.1056168Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:08.1057510Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:08.1059009Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:08.1060313Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:08.1061561Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:08.1062823Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:08.1341519Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:08.1342345Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:08.1343954Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:08.1344793Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:08.1472318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:08.1482022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:08.1483083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:08.1575182Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:08.5474402Z ok (2.931s) 2022-05-18T04:26:08.5610395Z test_mixture_of_experts_offload_true_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36399 2022-05-18T04:26:08.5716698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36400 2022-05-18T04:26:09.4435076Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnxguzxpl 2022-05-18T04:26:09.4436212Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnxguzxpl/_remote_module_non_scriptable.py 2022-05-18T04:26:09.4651897Z dist init r=1, world=2 2022-05-18T04:26:09.4655937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:09.5008844Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9lk847ye 2022-05-18T04:26:09.5011565Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9lk847ye/_remote_module_non_scriptable.py 2022-05-18T04:26:09.5224482Z dist init r=0, world=2 2022-05-18T04:26:09.5228936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:09.5230174Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:09.5267535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:10.8438273Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:10.8439252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:11.0447581Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:11.0448751Z warnings.warn( 2022-05-18T04:26:11.0450229Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:11.0451281Z warnings.warn( 2022-05-18T04:26:11.0488021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:11.0491592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:11.0492959Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:11.0544035Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0546653Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0549196Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0551683Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0591297Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:11.0640587Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0643730Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0646843Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0650176Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0725230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:11.0732232Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:11.0732949Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:11.0828336Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:11.0895263Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0897851Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0899858Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0901159Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0902414Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.0904045Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.1512006Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:11.1513471Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:11.1515287Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:11.1516551Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:11.1640499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:11.1645894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:11.1647516Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:11.1674861Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.1677466Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.1680086Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.1744079Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:11.1769804Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.1773241Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.1776343Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:11.5795064Z ok (3.032s) 2022-05-18T04:26:11.5929468Z test_mixture_of_experts_offload_true_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36504 2022-05-18T04:26:11.6036673Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36505 2022-05-18T04:26:12.4979224Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpevc9c9kh 2022-05-18T04:26:12.4981253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpevc9c9kh/_remote_module_non_scriptable.py 2022-05-18T04:26:12.5203946Z dist init r=1, world=2 2022-05-18T04:26:12.5208679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:12.5389824Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxnc10odq 2022-05-18T04:26:12.5392569Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxnc10odq/_remote_module_non_scriptable.py 2022-05-18T04:26:12.5608186Z dist init r=0, world=2 2022-05-18T04:26:12.5612635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:12.5613560Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:12.5617061Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:13.8964143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:13.8964907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:14.0991612Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:14.0992199Z warnings.warn( 2022-05-18T04:26:14.1049850Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:14.1050393Z warnings.warn( 2022-05-18T04:26:14.1074539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:14.1095102Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:14.1095875Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:14.1147446Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1149003Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1150295Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1151564Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1177851Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:14.1226420Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1227696Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1228983Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1230260Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1314613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:14.1319429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:14.1320203Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:14.1417586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:14.1486054Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1487481Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1488750Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1490183Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1491443Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.1492710Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.2101024Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:14.2101738Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:14.2108445Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:14.2109122Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:14.2235285Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:14.2245830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:14.2246698Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:14.2274461Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.2275780Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.2277053Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.2338425Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:14.2364151Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.2365426Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.2366832Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:14.6117009Z ok (3.032s) 2022-05-18T04:26:14.6249960Z test_mixture_of_experts_offload_true_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36609 2022-05-18T04:26:14.6355090Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36610 2022-05-18T04:26:15.5251201Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd26f8ih8 2022-05-18T04:26:15.5252572Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd26f8ih8/_remote_module_non_scriptable.py 2022-05-18T04:26:15.5475143Z dist init r=1, world=2 2022-05-18T04:26:15.5479648Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:15.5658324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqcrqi7jk 2022-05-18T04:26:15.5660858Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqcrqi7jk/_remote_module_non_scriptable.py 2022-05-18T04:26:15.5874284Z dist init r=0, world=2 2022-05-18T04:26:15.5878340Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:15.5879439Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:15.5888069Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:16.9062632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:16.9063168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:17.1066035Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:17.1066913Z warnings.warn( 2022-05-18T04:26:17.1084393Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:17.1084940Z warnings.warn( 2022-05-18T04:26:17.1108929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:17.1125755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:17.1126448Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:17.1178589Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1179876Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1181301Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1182548Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1212275Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:17.1263589Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1265058Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1266337Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1278019Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1347330Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:17.1354884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:17.1355737Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:17.1450215Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:17.1523233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1524565Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1525842Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1527105Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1528510Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1529775Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.1899673Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:17.1900390Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:17.1901552Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:17.1902210Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:17.2024368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:17.2028267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:17.2028945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:17.2054113Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.2055542Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.2056833Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.2127005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:17.2151385Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.2152679Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.2153925Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:17.5432402Z ok (2.931s) 2022-05-18T04:26:17.5565771Z test_mixture_of_experts_offload_true_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36698 2022-05-18T04:26:17.5672947Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36699 2022-05-18T04:26:18.4690767Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwl2j1h21 2022-05-18T04:26:18.4692125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwl2j1h21/_remote_module_non_scriptable.py 2022-05-18T04:26:18.4914628Z dist init r=1, world=2 2022-05-18T04:26:18.4919064Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:18.5100906Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpegm8zkw2 2022-05-18T04:26:18.5103971Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpegm8zkw2/_remote_module_non_scriptable.py 2022-05-18T04:26:18.5322830Z dist init r=0, world=2 2022-05-18T04:26:18.5327299Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:18.5328083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:18.5328799Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:19.8722499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:19.8723106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:20.0732135Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:20.0733054Z warnings.warn( 2022-05-18T04:26:20.0743158Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:20.0744025Z warnings.warn( 2022-05-18T04:26:20.0774460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:20.0788086Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:20.0788794Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:20.0842993Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0844304Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0845809Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0847237Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0877333Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:20.0931416Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0932697Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0933980Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.0935249Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1021473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:20.1030025Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:20.1030750Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:20.1124419Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:20.1201262Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1202577Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1203860Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1205785Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1208273Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1210957Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1589416Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:20.1590145Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:20.1591090Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:20.1591760Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:20.1725270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:20.1725790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:20.1726505Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:20.1727212Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:20.1752869Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1754171Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1755583Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1756856Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1758126Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.1759403Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:20.4750000Z ok (2.932s) 2022-05-18T04:26:20.4883227Z test_mixture_of_experts_offload_true_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36787 2022-05-18T04:26:20.4987770Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36788 2022-05-18T04:26:21.3863849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwe9vw44c 2022-05-18T04:26:21.3865317Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwe9vw44c/_remote_module_non_scriptable.py 2022-05-18T04:26:21.4088847Z dist init r=1, world=2 2022-05-18T04:26:21.4093465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:21.4232394Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_su8trtb 2022-05-18T04:26:21.4235021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_su8trtb/_remote_module_non_scriptable.py 2022-05-18T04:26:21.4448366Z dist init r=0, world=2 2022-05-18T04:26:21.4452245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:21.4453345Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:21.4502358Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:22.8004774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:22.8005393Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:22.9987860Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:22.9989027Z warnings.warn( 2022-05-18T04:26:23.0015387Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:23.0016525Z warnings.warn( 2022-05-18T04:26:23.0040319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:23.0060409Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:23.0062035Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:23.0116377Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0119004Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0121560Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0124052Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0143419Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:23.0196440Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0199597Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0202666Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0205738Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0280172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:23.0286883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:23.0287593Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:23.0384027Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:23.0458395Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0461185Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0463179Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0464905Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0466174Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0467437Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0833646Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:23.0835027Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:23.0836816Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:23.0838060Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:23.0961483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:23.0965715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:23.0967059Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:23.0993505Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0996090Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.0998690Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.1065003Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:23.1089720Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.1092878Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.1095953Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:23.4064755Z ok (2.931s) 2022-05-18T04:26:23.4196873Z test_mixture_of_experts_offload_true_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36876 2022-05-18T04:26:23.4303537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36877 2022-05-18T04:26:24.3228627Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmsb8uf7u 2022-05-18T04:26:24.3230095Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmsb8uf7u/_remote_module_non_scriptable.py 2022-05-18T04:26:24.3446613Z dist init r=1, world=2 2022-05-18T04:26:24.3451468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:24.3690638Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqcupdp__ 2022-05-18T04:26:24.3693015Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqcupdp__/_remote_module_non_scriptable.py 2022-05-18T04:26:24.3905016Z dist init r=0, world=2 2022-05-18T04:26:24.3909260Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:24.3910071Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:24.3961864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:25.7211337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:25.7212309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:25.9186516Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:25.9187630Z warnings.warn( 2022-05-18T04:26:25.9211610Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:25.9212740Z warnings.warn( 2022-05-18T04:26:25.9236559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:25.9256740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:25.9258072Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:25.9312549Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9315188Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9317735Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9320263Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9340006Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:25.9393075Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9396499Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9399594Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9402647Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9477315Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:25.9484980Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:25.9485704Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:25.9581023Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:25.9655741Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9658340Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9660561Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9661882Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9663150Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:25.9664893Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.0034639Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:26.0036273Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:26.0038050Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:26.0039300Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:26.0163375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:26.0165300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:26.0167033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:26.0193839Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.0196424Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.0199004Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.0266623Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:26.0291667Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.0295032Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.0298132Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:26.3381049Z ok (2.931s) 2022-05-18T04:26:26.3516760Z test_mixture_of_experts_offload_true_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36965 2022-05-18T04:26:26.3629475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36966 2022-05-18T04:26:27.3115564Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyfbh8hk2 2022-05-18T04:26:27.3116410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyfbh8hk2/_remote_module_non_scriptable.py 2022-05-18T04:26:27.3176471Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5lxxndsu 2022-05-18T04:26:27.3179275Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5lxxndsu/_remote_module_non_scriptable.py 2022-05-18T04:26:27.3334909Z dist init r=1, world=2 2022-05-18T04:26:27.3339012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:27.3400236Z dist init r=0, world=2 2022-05-18T04:26:27.3404382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:27.3405407Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:27.3442447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:28.6932562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:28.6933075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:28.8932052Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:28.8932719Z warnings.warn( 2022-05-18T04:26:28.8977985Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:28.8978542Z warnings.warn( 2022-05-18T04:26:28.9001963Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:28.9021305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:28.9022333Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:28.9073467Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9075014Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9076316Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9077585Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9105207Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:28.9154154Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9155447Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9156888Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9158150Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9240588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:28.9245627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:28.9246350Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:28.9343572Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:28.9412133Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9413463Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9414776Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9416232Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9417524Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:28.9418789Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.0026057Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:29.0026781Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:29.0028478Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:29.0029382Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:29.0155221Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:29.0164626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:29.0165676Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:29.0192444Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.0193746Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.0195029Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.0258197Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:29.0284128Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.0285423Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.0286897Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:29.4709704Z ok (3.133s) 2022-05-18T04:26:29.4843961Z test_mixture_of_experts_offload_true_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37070 2022-05-18T04:26:29.4950108Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37071 2022-05-18T04:26:30.4218720Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1016z9tq 2022-05-18T04:26:30.4219869Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1016z9tq/_remote_module_non_scriptable.py 2022-05-18T04:26:30.4370168Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5o3o41pp 2022-05-18T04:26:30.4372861Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5o3o41pp/_remote_module_non_scriptable.py 2022-05-18T04:26:30.4436159Z dist init r=1, world=2 2022-05-18T04:26:30.4440372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:30.4594299Z dist init r=0, world=2 2022-05-18T04:26:30.4598492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:30.4599671Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:30.4645786Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:31.8115008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:31.8115615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:32.0111383Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:32.0112009Z warnings.warn( 2022-05-18T04:26:32.0141138Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:32.0141706Z warnings.warn( 2022-05-18T04:26:32.0165351Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:32.0185237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:32.0186343Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:32.0237595Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0238893Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0240360Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0241650Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0267535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:32.0315816Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0317100Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0318361Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0319776Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0403587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:32.0408618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:32.0409696Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:32.0506748Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:32.0574380Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0575699Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0576981Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0578253Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0580514Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.0582964Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.1186518Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:32.1187249Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:32.1188389Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:32.1189048Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:32.1318010Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:32.1327133Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:32.1328263Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:32.1354699Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.1356128Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.1357413Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.1421056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:32.1446272Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.1447550Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.1448823Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:32.5028385Z ok (3.032s) 2022-05-18T04:26:32.5161204Z test_mixture_of_experts_offload_true_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37175 2022-05-18T04:26:32.5265870Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37176 2022-05-18T04:26:33.4721917Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiki1uf5r 2022-05-18T04:26:33.4723202Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiki1uf5r/_remote_module_non_scriptable.py 2022-05-18T04:26:33.4727610Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6gdvkue5 2022-05-18T04:26:33.4730291Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6gdvkue5/_remote_module_non_scriptable.py 2022-05-18T04:26:33.4937860Z dist init r=0, world=2 2022-05-18T04:26:33.4942153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:33.4946539Z dist init r=1, world=2 2022-05-18T04:26:33.4950640Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:33.4951672Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:33.5046031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:34.8332106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:34.8332681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:35.0306693Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:35.0307305Z warnings.warn( 2022-05-18T04:26:35.0331370Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:35.0331908Z warnings.warn( 2022-05-18T04:26:35.0356427Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:35.0373748Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:35.0374565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:35.0426474Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0427765Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0429043Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0430506Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0459779Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:35.0511767Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0513048Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0514325Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0515591Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0597519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:35.0605126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:35.0606163Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:35.0701072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:35.0774993Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0776338Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0777625Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0778896Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0780159Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.0781591Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.1158191Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:35.1158865Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:35.1159804Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:35.1160468Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:35.1287881Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:35.1288404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:35.1289074Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:35.1289922Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:35.1314025Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.1315324Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.1316600Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.1317875Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.1319119Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.1320380Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:35.4342786Z ok (2.931s) 2022-05-18T04:26:35.4474829Z test_mixture_of_experts_offload_true_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37264 2022-05-18T04:26:35.4582287Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37265 2022-05-18T04:26:36.3546021Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptz50ls_e 2022-05-18T04:26:36.3547101Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptz50ls_e/_remote_module_non_scriptable.py 2022-05-18T04:26:36.3670830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwvs6j94s 2022-05-18T04:26:36.3673728Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwvs6j94s/_remote_module_non_scriptable.py 2022-05-18T04:26:36.3762355Z dist init r=1, world=2 2022-05-18T04:26:36.3766264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:36.3897695Z dist init r=0, world=2 2022-05-18T04:26:36.3902098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:36.3903310Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:36.3971458Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:37.7383873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:37.7384651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:37.9379671Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:37.9380276Z warnings.warn( 2022-05-18T04:26:37.9402489Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:37.9403040Z warnings.warn( 2022-05-18T04:26:37.9426865Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:37.9447011Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:37.9447958Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:37.9502548Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9504090Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9505384Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9506653Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9530439Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:37.9583066Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9584639Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9585922Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9587190Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9670513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:37.9675925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:37.9676767Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:37.9773479Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:37.9850276Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9851595Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9853548Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9855998Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9858463Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:37.9860958Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.0247246Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:38.0248313Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:38.0249603Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:38.0250266Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:38.0375669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:38.0384138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:38.0385120Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:38.0412064Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.0413532Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.0414795Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.0478413Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:38.0502524Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.0504038Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.0505333Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:38.3657404Z ok (2.931s) 2022-05-18T04:26:38.3793966Z test_mixture_of_experts_offload_true_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37353 2022-05-18T04:26:38.3898552Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37354 2022-05-18T04:26:39.2828909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbe0wcswr 2022-05-18T04:26:39.2830226Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbe0wcswr/_remote_module_non_scriptable.py 2022-05-18T04:26:39.2952609Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn9uwf233 2022-05-18T04:26:39.2955663Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn9uwf233/_remote_module_non_scriptable.py 2022-05-18T04:26:39.3053799Z dist init r=1, world=2 2022-05-18T04:26:39.3058539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:39.3168638Z dist init r=0, world=2 2022-05-18T04:26:39.3172972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:39.3174013Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:39.3264669Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:40.6747128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:40.6748009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:40.8742632Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:40.8744383Z warnings.warn( 2022-05-18T04:26:40.8780063Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:40.8781637Z warnings.warn( 2022-05-18T04:26:40.8805682Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:40.8820604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:40.8821687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:40.8873667Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8874973Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8876247Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8877514Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8909245Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:40.8960301Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8962023Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8963348Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.8964613Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9043429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:40.9050465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:40.9051170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:40.9146129Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:40.9218569Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9220114Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9221682Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9223068Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9224586Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9226144Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9591883Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:40.9592767Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:40.9593899Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:40.9594800Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:40.9718633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:40.9719560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:40.9720623Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:40.9744720Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9746029Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9747514Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9821259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:40.9845420Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9846707Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:40.9847982Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:41.2975081Z ok (2.932s) 2022-05-18T04:26:41.3108876Z test_mixture_of_experts_offload_true_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37442 2022-05-18T04:26:41.3214535Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37443 2022-05-18T04:26:42.2233533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy8xysahp 2022-05-18T04:26:42.2234381Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy8xysahp/_remote_module_non_scriptable.py 2022-05-18T04:26:42.2448385Z dist init r=0, world=2 2022-05-18T04:26:42.2452332Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:42.2652897Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphruc32fn 2022-05-18T04:26:42.2655965Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphruc32fn/_remote_module_non_scriptable.py 2022-05-18T04:26:42.2879343Z dist init r=1, world=2 2022-05-18T04:26:42.2883854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:42.2884886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:42.2962336Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:43.6427684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:43.6428222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:43.8424197Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:43.8424816Z warnings.warn( 2022-05-18T04:26:43.8454834Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:43.8455368Z warnings.warn( 2022-05-18T04:26:43.8478703Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:43.8500220Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:43.8501491Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:43.8556653Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8558021Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8559304Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8560576Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8581937Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:43.8633614Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8634906Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8636352Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8637633Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8720187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:43.8725248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:43.8725942Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:43.8823202Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:43.8898676Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8900456Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8901741Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8903001Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8904470Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.8905728Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.9284393Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:43.9285087Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:43.9288199Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:43.9288874Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:43.9414469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:43.9424941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:43.9426194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:43.9451746Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.9453069Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.9454343Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.9517364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:43.9541979Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.9543295Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:43.9544847Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:44.3292357Z ok (3.032s) 2022-05-18T04:26:44.3426845Z test_mixture_of_experts_offload_true_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37531 2022-05-18T04:26:44.3531922Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37532 2022-05-18T04:26:45.2496533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbss83s1z 2022-05-18T04:26:45.2497545Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbss83s1z/_remote_module_non_scriptable.py 2022-05-18T04:26:45.2532861Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp528_6cuo 2022-05-18T04:26:45.2535933Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp528_6cuo/_remote_module_non_scriptable.py 2022-05-18T04:26:45.2710988Z dist init r=0, world=2 2022-05-18T04:26:45.2715227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:45.2762269Z dist init r=1, world=2 2022-05-18T04:26:45.2766934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:45.2768209Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:45.2819042Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:46.6186595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:46.6187110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:46.8249445Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:46.8250565Z warnings.warn( 2022-05-18T04:26:46.8278499Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:46.8279048Z warnings.warn( 2022-05-18T04:26:46.8301502Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:46.8323207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:46.8324080Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:46.8375433Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8376720Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8378004Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8379271Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8405062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:46.8453786Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8455102Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8456391Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8457829Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8539382Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:46.8543801Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:46.8544738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:46.8642440Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:46.8710787Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8712080Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8713478Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8714758Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8716761Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.8719301Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.9341804Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:46.9342521Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:46.9349303Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:46.9349992Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:46.9475867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:46.9486210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:46.9487393Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:46.9514221Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.9515549Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.9517036Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.9578901Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:46.9604126Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.9605562Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:46.9606849Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:47.3611552Z ok (3.032s) 2022-05-18T04:26:47.3744671Z test_mixture_of_experts_offload_true_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37636 2022-05-18T04:26:47.3852000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37637 2022-05-18T04:26:48.2949972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk83zjd_h 2022-05-18T04:26:48.2950821Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk83zjd_h/_remote_module_non_scriptable.py 2022-05-18T04:26:48.3167137Z dist init r=1, world=2 2022-05-18T04:26:48.3171381Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:48.3181788Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoyt3_i5f 2022-05-18T04:26:48.3184738Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoyt3_i5f/_remote_module_non_scriptable.py 2022-05-18T04:26:48.3400453Z dist init r=0, world=2 2022-05-18T04:26:48.3404843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:48.3405983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:48.3478614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:49.6772505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:49.6773375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:49.8772552Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:49.8773176Z warnings.warn( 2022-05-18T04:26:49.8777147Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:49.8777698Z warnings.warn( 2022-05-18T04:26:49.8813199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:49.8819228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:49.8820347Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:49.8869305Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8870849Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8872111Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8873371Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8916412Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:49.8965198Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8966480Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8967743Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.8968996Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9051202Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:49.9058308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:49.9059534Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:49.9154332Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:49.9220674Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9222142Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9223429Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9225124Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9226395Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9227644Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9837563Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:49.9838271Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:49.9839225Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:49.9839879Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:49.9966510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:49.9967206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:49.9967897Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:49.9968597Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:49.9992785Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9994134Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9995432Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9996701Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9997974Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:49.9999370Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:50.3930512Z ok (3.032s) 2022-05-18T04:26:50.4066840Z test_mixture_of_experts_offload_true_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37741 2022-05-18T04:26:50.4176769Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37742 2022-05-18T04:26:51.3182578Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph22a6klf 2022-05-18T04:26:51.3183537Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph22a6klf/_remote_module_non_scriptable.py 2022-05-18T04:26:51.3377439Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbgbaz8cz 2022-05-18T04:26:51.3379887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbgbaz8cz/_remote_module_non_scriptable.py 2022-05-18T04:26:51.3405695Z dist init r=0, world=2 2022-05-18T04:26:51.3410020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:51.3593251Z dist init r=1, world=2 2022-05-18T04:26:51.3597595Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:51.3598419Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:51.3615132Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:52.7261421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:52.7261971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:52.9249204Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:52.9249870Z warnings.warn( 2022-05-18T04:26:52.9277042Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:52.9277610Z warnings.warn( 2022-05-18T04:26:52.9300740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:52.9320333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:52.9321384Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:52.9373770Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9375095Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9376612Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9377892Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9403845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:52.9455357Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9456636Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9457921Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9459181Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9540351Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:52.9547172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:52.9547864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:52.9643401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:52.9717487Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9718786Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9720059Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9721321Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9722704Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:52.9723948Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.0106597Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:53.0107287Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:53.0109122Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:53.0109784Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:53.0232557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:53.0240439Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:53.0241267Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:53.0267308Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.0268697Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.0269979Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.0335555Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:53.0360028Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.0361309Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.0362574Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:53.4255355Z ok (3.032s) 2022-05-18T04:26:53.4389346Z test_mixture_of_experts_offload_true_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37830 2022-05-18T04:26:53.4495076Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37831 2022-05-18T04:26:54.3780530Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7aixk59f 2022-05-18T04:26:54.3781424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7aixk59f/_remote_module_non_scriptable.py 2022-05-18T04:26:54.3946019Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkt0_3k2n 2022-05-18T04:26:54.3949166Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkt0_3k2n/_remote_module_non_scriptable.py 2022-05-18T04:26:54.3997643Z dist init r=0, world=2 2022-05-18T04:26:54.4001725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:54.4172283Z dist init r=1, world=2 2022-05-18T04:26:54.4176781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:54.4178293Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:54.4207033Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:55.7639558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:55.7640132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:55.9654770Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:55.9655389Z warnings.warn( 2022-05-18T04:26:55.9659662Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:55.9660215Z warnings.warn( 2022-05-18T04:26:55.9698329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:55.9702391Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:55.9703098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:55.9755830Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9757142Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9758408Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9759846Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9801289Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:55.9855996Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9857283Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9858572Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9859849Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:55.9943940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:55.9953635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:55.9954359Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:56.0046840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:56.0121596Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0122921Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0124206Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0125479Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0126748Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0128110Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0518654Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:56.0519447Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:56.0522517Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:56.0523203Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:56.0648724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:56.0658141Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:56.0659002Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:56.0685598Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0686894Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0688280Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0751840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:56.0777058Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0778354Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.0779611Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:56.4575278Z ok (3.032s) 2022-05-18T04:26:56.4708839Z test_mixture_of_experts_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37919 2022-05-18T04:26:56.4813178Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37920 2022-05-18T04:26:57.3703128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7hrywmm0 2022-05-18T04:26:57.3704450Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7hrywmm0/_remote_module_non_scriptable.py 2022-05-18T04:26:57.3926896Z dist init r=1, world=2 2022-05-18T04:26:57.3930975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:26:57.4018930Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprbxpbgwk 2022-05-18T04:26:57.4021926Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprbxpbgwk/_remote_module_non_scriptable.py 2022-05-18T04:26:57.4234099Z dist init r=0, world=2 2022-05-18T04:26:57.4238237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:26:57.4239314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:57.4339710Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:26:58.7824572Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:58.7825112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:58.9836183Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:58.9836788Z warnings.warn( 2022-05-18T04:26:58.9867266Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:26:58.9867813Z warnings.warn( 2022-05-18T04:26:58.9890831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:26:58.9911307Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:26:58.9912194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:58.9966773Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:58.9968089Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:58.9969366Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:58.9970639Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:58.9994363Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:26:59.0045597Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0046882Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0048151Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0049422Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0131869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:26:59.0136734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:26:59.0137410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:59.0234826Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:26:59.0309950Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0311409Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0312688Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0313962Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0315222Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0316487Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0695111Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:59.0696019Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:59.0698276Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:26:59.0698943Z warnings.warn(msg, FutureWarning) 2022-05-18T04:26:59.0825532Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:26:59.0835906Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:26:59.0836592Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:59.0862460Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0864011Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0865290Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0928756Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:26:59.0953563Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0954865Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.0956144Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:26:59.4892152Z ok (3.032s) 2022-05-18T04:26:59.5024631Z test_mixture_of_experts_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38008 2022-05-18T04:26:59.5134113Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38009 2022-05-18T04:27:00.4180319Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwf1cb9kf 2022-05-18T04:27:00.4181691Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwf1cb9kf/_remote_module_non_scriptable.py 2022-05-18T04:27:00.4396050Z dist init r=0, world=2 2022-05-18T04:27:00.4400343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:00.4680637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwkkrkq1o 2022-05-18T04:27:00.4683236Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwkkrkq1o/_remote_module_non_scriptable.py 2022-05-18T04:27:00.4904988Z dist init r=1, world=2 2022-05-18T04:27:00.4909865Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:00.4910694Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:00.4911390Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:01.8445920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:01.8446466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:02.0397557Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:02.0398172Z warnings.warn( 2022-05-18T04:27:02.0477688Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:02.0478227Z warnings.warn( 2022-05-18T04:27:02.0501600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:02.0521945Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:02.0522641Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:02.0577805Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0579143Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0580424Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0581699Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0604997Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:02.0656898Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0658357Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0659630Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0660897Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0744053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:02.0748652Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:02.0749337Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:02.0846715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:02.0922259Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0923567Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0924938Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0926215Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0927485Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.0928746Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.1308685Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:02.1309557Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:02.1310702Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:02.1311368Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:02.1437134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:02.1447893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:02.1448679Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:02.1475147Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.1476469Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.1477756Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.1539896Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:02.1565532Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.1567004Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.1568312Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:02.5211599Z ok (3.032s) 2022-05-18T04:27:02.5347157Z test_mixture_of_experts_with_delay_before_free_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38097 2022-05-18T04:27:02.5454999Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38098 2022-05-18T04:27:03.4726516Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplku4vg8o 2022-05-18T04:27:03.4727553Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplku4vg8o/_remote_module_non_scriptable.py 2022-05-18T04:27:03.4948461Z dist init r=1, world=2 2022-05-18T04:27:03.4952440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:03.4984717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplkjoz44r 2022-05-18T04:27:03.4987559Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplkjoz44r/_remote_module_non_scriptable.py 2022-05-18T04:27:03.5202831Z dist init r=0, world=2 2022-05-18T04:27:03.5207001Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:03.5207895Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:03.5259378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:04.8601331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:04.8601891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:05.0595948Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:05.0597093Z warnings.warn( 2022-05-18T04:27:05.0644825Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:05.0645948Z warnings.warn( 2022-05-18T04:27:05.0671678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:05.0689546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:05.0690900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:05.0735556Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:05.0738339Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:05.0740805Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:05.0775306Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:05.0819358Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:05.0822428Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:05.0825175Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:05.1310768Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:05.1312135Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:05.1313969Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:05.1315224Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:05.1448867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:05.1449911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:05.1451299Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:05.1452631Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:05.5535246Z ok (3.032s) 2022-05-18T04:27:05.5668398Z test_mixture_of_experts_with_delay_before_free_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38200 2022-05-18T04:27:05.5775944Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38201 2022-05-18T04:27:06.4761583Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpclyzxegi 2022-05-18T04:27:06.4762703Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpclyzxegi/_remote_module_non_scriptable.py 2022-05-18T04:27:06.4879272Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps7_ocqt9 2022-05-18T04:27:06.4882012Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps7_ocqt9/_remote_module_non_scriptable.py 2022-05-18T04:27:06.4984628Z dist init r=1, world=2 2022-05-18T04:27:06.4989576Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:06.5101414Z dist init r=0, world=2 2022-05-18T04:27:06.5105679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:06.5106854Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:06.5194984Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:07.8610692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:07.8611239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:08.0581631Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:08.0582232Z warnings.warn( 2022-05-18T04:27:08.0585948Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:08.0586485Z warnings.warn( 2022-05-18T04:27:08.0623384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:08.0627774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:08.0628482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:08.0674675Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:08.0676000Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:08.0677262Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:08.0726705Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:08.0781601Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3195: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:27:08.0782481Z ((rank, indices[rank]) for rank in range(self.world_size)), 2, 2022-05-18T04:27:08.5671322Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:08.5672294Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:08.5673436Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:08.5674094Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:08.5802824Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:08.5810770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:08.5811485Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:08.5905585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:09.2868214Z ok (3.733s) 2022-05-18T04:27:09.3002566Z test_mixture_of_experts_with_delay_before_free_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38287 2022-05-18T04:27:09.3110510Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38288 2022-05-18T04:27:10.2288910Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoelgosiv 2022-05-18T04:27:10.2290016Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoelgosiv/_remote_module_non_scriptable.py 2022-05-18T04:27:10.2503592Z dist init r=0, world=2 2022-05-18T04:27:10.2507874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:10.2660576Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo262wcl7 2022-05-18T04:27:10.2663436Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo262wcl7/_remote_module_non_scriptable.py 2022-05-18T04:27:10.2885484Z dist init r=1, world=2 2022-05-18T04:27:10.2890176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:10.2890977Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:10.2916296Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:11.6533286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:11.6533866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:11.8553473Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:11.8554101Z warnings.warn( 2022-05-18T04:27:11.8554896Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:11.8555422Z warnings.warn( 2022-05-18T04:27:11.8594256Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:11.8594780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:11.8595483Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:11.8596155Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:11.8642123Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:11.8643648Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:11.8644949Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:11.8647525Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3195: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:27:11.8648408Z ((rank, indices[rank]) for rank in range(self.world_size)), 2, 2022-05-18T04:27:11.8928801Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:11.8929721Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:11.8930678Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:11.8931338Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:11.9059669Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:11.9060802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:11.9062269Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:11.9162781Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:12.2188131Z ok (2.932s) 2022-05-18T04:27:12.2322505Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38374 2022-05-18T04:27:12.2429952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38375 2022-05-18T04:27:13.1097113Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3sq8uamy 2022-05-18T04:27:13.1098388Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3sq8uamy/_remote_module_non_scriptable.py 2022-05-18T04:27:13.1285893Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpazcur_64 2022-05-18T04:27:13.1288708Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpazcur_64/_remote_module_non_scriptable.py 2022-05-18T04:27:13.1310985Z dist init r=1, world=2 2022-05-18T04:27:13.1315448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:13.1511385Z dist init r=0, world=2 2022-05-18T04:27:13.1515843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:13.1517222Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:13.1520905Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:14.5005665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:14.5006378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:14.7001076Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:14.7001715Z warnings.warn( 2022-05-18T04:27:14.7002466Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:14.7003024Z warnings.warn( 2022-05-18T04:27:14.7040981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:14.7042823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:14.7044129Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:14.7087263Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:14.7088575Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:14.7089847Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:14.7144203Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:14.7187259Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:14.7188553Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:14.7189820Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:14.7711163Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:14.7711868Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:14.7713069Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:14.7713750Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:14.7841666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:14.7846604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:14.7847783Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:14.7944888Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:15.1505935Z ok (2.932s) 2022-05-18T04:27:15.1640154Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38477 2022-05-18T04:27:15.1747204Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38478 2022-05-18T04:27:16.0697053Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg3_i2e1y 2022-05-18T04:27:16.0698200Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg3_i2e1y/_remote_module_non_scriptable.py 2022-05-18T04:27:16.0827383Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz039u10b 2022-05-18T04:27:16.0830046Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz039u10b/_remote_module_non_scriptable.py 2022-05-18T04:27:16.0911710Z dist init r=0, world=2 2022-05-18T04:27:16.0915923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:16.1051970Z dist init r=1, world=2 2022-05-18T04:27:16.1056402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:16.1057552Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:16.1121341Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:17.4629992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:17.4630521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:17.6596826Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:17.6597989Z warnings.warn( 2022-05-18T04:27:17.6656839Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:17.6657962Z warnings.warn( 2022-05-18T04:27:17.6682002Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:17.6700867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:17.6702197Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:17.6750479Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:17.6753279Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:17.6755701Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:17.6785419Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:17.6838353Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3195: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:27:17.6840076Z ((rank, indices[rank]) for rank in range(self.world_size)), 2, 2022-05-18T04:27:18.1751282Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:18.1752697Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:18.1754486Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:18.1755706Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:18.1881664Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:18.1884971Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:18.1886418Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:18.1984889Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:18.8839530Z ok (3.733s) 2022-05-18T04:27:18.8975200Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38564 2022-05-18T04:27:18.9080986Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38565 2022-05-18T04:27:19.8000669Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyy928dvv 2022-05-18T04:27:19.8001666Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyy928dvv/_remote_module_non_scriptable.py 2022-05-18T04:27:19.8029494Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3bi18hc6 2022-05-18T04:27:19.8032045Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3bi18hc6/_remote_module_non_scriptable.py 2022-05-18T04:27:19.8215864Z dist init r=0, world=2 2022-05-18T04:27:19.8220190Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:19.8254657Z dist init r=1, world=2 2022-05-18T04:27:19.8259029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:19.8260882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:19.8323942Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:21.1508944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:21.1509774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:21.3470914Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:21.3471505Z warnings.warn( 2022-05-18T04:27:21.3556498Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:21.3557052Z warnings.warn( 2022-05-18T04:27:21.3580735Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:21.3598559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:21.3599600Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:21.3646953Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:21.3648281Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:21.3649561Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:21.3684052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:21.3735960Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3195: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:27:21.3736854Z ((rank, indices[rank]) for rank in range(self.world_size)), 2, 2022-05-18T04:27:21.4017387Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:21.4018341Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:21.4019285Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:21.4020107Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:21.4146469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:21.4151137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:21.4151837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:21.4249172Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:21.7156526Z ok (2.832s) 2022-05-18T04:27:21.7292258Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38651 2022-05-18T04:27:21.7399927Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38652 2022-05-18T04:27:22.6591540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkgdbuso7 2022-05-18T04:27:22.6593054Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkgdbuso7/_remote_module_non_scriptable.py 2022-05-18T04:27:22.6815155Z dist init r=1, world=2 2022-05-18T04:27:22.6819694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:22.6876617Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcj5pfhui 2022-05-18T04:27:22.6879489Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcj5pfhui/_remote_module_non_scriptable.py 2022-05-18T04:27:22.7092813Z dist init r=0, world=2 2022-05-18T04:27:22.7097252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:22.7098351Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:22.7126586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:24.0492883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:24.0493400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:24.2465679Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:24.2466321Z warnings.warn( 2022-05-18T04:27:24.2512741Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:24.2513317Z warnings.warn( 2022-05-18T04:27:24.2539289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:24.2554430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:24.2555127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:24.2598043Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:24.2599591Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:24.2600888Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:24.2642300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:24.2686806Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:24.2688103Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:24.2689378Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:24.3181615Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:24.3182339Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:24.3183270Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:24.3184189Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:24.3316328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:24.3320871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:24.3321611Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:24.3418844Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:24.7480007Z ok (3.032s) 2022-05-18T04:27:24.7611305Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38754 2022-05-18T04:27:24.7721763Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38755 2022-05-18T04:27:25.6677131Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr15bwlnb 2022-05-18T04:27:25.6678146Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr15bwlnb/_remote_module_non_scriptable.py 2022-05-18T04:27:25.6891791Z dist init r=1, world=2 2022-05-18T04:27:25.6895767Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:25.7068992Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph4nu_wpt 2022-05-18T04:27:25.7071821Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph4nu_wpt/_remote_module_non_scriptable.py 2022-05-18T04:27:25.7292153Z dist init r=0, world=2 2022-05-18T04:27:25.7296593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:25.7297585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:25.7303882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:27.0928418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:27.0928933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:27.2919838Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:27.2920437Z warnings.warn( 2022-05-18T04:27:27.2937049Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:27.2937604Z warnings.warn( 2022-05-18T04:27:27.2961936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:27.2980985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:27.2981872Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:27.3036668Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3195: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:27:27.3037548Z ((rank, indices[rank]) for rank in range(self.world_size)), 2, 2022-05-18T04:27:27.3064987Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:27.3111436Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:27.3112738Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:27.3114005Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:27.8104107Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:27.8105104Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:27.8106653Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:27.8107328Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:27.8236659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:27.8247776Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:27.8248494Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:27.8339671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:28.5815963Z ok (3.833s) 2022-05-18T04:27:28.5950859Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38841 2022-05-18T04:27:28.6056057Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38842 2022-05-18T04:27:29.4971695Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6wxfw8ru 2022-05-18T04:27:29.4973108Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6wxfw8ru/_remote_module_non_scriptable.py 2022-05-18T04:27:29.5186207Z dist init r=1, world=2 2022-05-18T04:27:29.5190335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:29.5484319Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp62eftn4t 2022-05-18T04:27:29.5487080Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp62eftn4t/_remote_module_non_scriptable.py 2022-05-18T04:27:29.5701479Z dist init r=0, world=2 2022-05-18T04:27:29.5705729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:29.5707153Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:29.5801970Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:30.9073074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:30.9073618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:31.1045256Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:31.1045851Z warnings.warn( 2022-05-18T04:27:31.1073508Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:31.1074041Z warnings.warn( 2022-05-18T04:27:31.1097583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:31.1115528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:31.1116250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:31.1168206Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:3195: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:27:31.1169301Z ((rank, indices[rank]) for rank in range(self.world_size)), 2, 2022-05-18T04:27:31.1200757Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:31.1247142Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:31.1248446Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:31.1249713Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:31.1533399Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:31.1534677Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:31.1535612Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:31.1536267Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:31.1665096Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:31.1665620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:31.1666313Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:31.1667002Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:31.5134214Z ok (2.932s) 2022-05-18T04:27:31.5264525Z test_mixture_of_experts_with_delay_before_free_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38928 2022-05-18T04:27:31.5369967Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38929 2022-05-18T04:27:32.4872691Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4offr_lq 2022-05-18T04:27:32.4874109Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4offr_lq/_remote_module_non_scriptable.py 2022-05-18T04:27:32.4983747Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc6bmgtuq 2022-05-18T04:27:32.4986789Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc6bmgtuq/_remote_module_non_scriptable.py 2022-05-18T04:27:32.5093476Z dist init r=0, world=2 2022-05-18T04:27:32.5097986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:32.5199285Z dist init r=1, world=2 2022-05-18T04:27:32.5203655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:32.5204655Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:32.5303631Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:33.8402448Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:33.8402961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:34.0711485Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:34.0712104Z warnings.warn( 2022-05-18T04:27:34.0733537Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:34.0734088Z warnings.warn( 2022-05-18T04:27:34.0757602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:34.0778581Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:34.0779660Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:34.0829897Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0831244Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0832520Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0833793Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0835061Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0836304Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0860538Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:34.0907594Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0909028Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0910344Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0911609Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0912867Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0914116Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.0996756Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:34.1002017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:34.1002738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:34.1099659Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:34.1169058Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.1170594Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.1171890Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.1173140Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.1174399Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.1175793Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.1788888Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:34.1789649Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:34.1795650Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:34.1796338Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:34.1922603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:34.1933510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:34.1934413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:34.1963164Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.2025605Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:34.2052757Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:34.5449191Z ok (3.031s) 2022-05-18T04:27:34.5579460Z test_mixture_of_experts_with_delay_before_free_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39033 2022-05-18T04:27:34.5685666Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39034 2022-05-18T04:27:35.4616878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgc5qlnog 2022-05-18T04:27:35.4617856Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgc5qlnog/_remote_module_non_scriptable.py 2022-05-18T04:27:35.4830928Z dist init r=0, world=2 2022-05-18T04:27:35.4835138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:35.5039849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprcxw8gio 2022-05-18T04:27:35.5042705Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprcxw8gio/_remote_module_non_scriptable.py 2022-05-18T04:27:35.5255343Z dist init r=1, world=2 2022-05-18T04:27:35.5259514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:35.5260551Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:35.5345677Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:36.8339795Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:36.8340303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:37.0343381Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:37.0344267Z warnings.warn( 2022-05-18T04:27:37.0360822Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:37.0361394Z warnings.warn( 2022-05-18T04:27:37.0384728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:37.0402242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:37.0403183Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:37.0453297Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0454769Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0456044Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0457312Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0458569Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0459814Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0487901Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:37.0536698Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0538005Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0539366Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0540651Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0541903Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0543150Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0621655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:37.0628980Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:37.0629853Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:37.0724637Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:37.0886908Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0888200Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0889471Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0890728Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0891987Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.0893250Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.6209000Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:37.6210132Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:37.6211119Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:37.6211791Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:37.6334210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:37.6339345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:37.6340083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:37.6361393Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.6362827Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.6436808Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:37.6458140Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:37.6459432Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:38.4782768Z ok (3.933s) 2022-05-18T04:27:38.4917185Z test_mixture_of_experts_with_delay_before_free_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39122 2022-05-18T04:27:38.5023347Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39123 2022-05-18T04:27:39.4009234Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9mtkyj7n 2022-05-18T04:27:39.4010449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9mtkyj7n/_remote_module_non_scriptable.py 2022-05-18T04:27:39.4234144Z dist init r=1, world=2 2022-05-18T04:27:39.4238945Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:39.4399995Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp13cat4td 2022-05-18T04:27:39.4402487Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp13cat4td/_remote_module_non_scriptable.py 2022-05-18T04:27:39.4617804Z dist init r=0, world=2 2022-05-18T04:27:39.4621828Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:39.4622644Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:39.4648436Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:40.7908844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:40.7909876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:40.9915552Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:40.9916716Z warnings.warn( 2022-05-18T04:27:40.9962758Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:40.9963876Z warnings.warn( 2022-05-18T04:27:40.9986275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:41.0008645Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:41.0010039Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:41.0064094Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0066814Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0069308Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0071881Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0074403Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0076918Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0089491Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:41.0140482Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0144296Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0147591Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0150820Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0154036Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0157216Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0228176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:41.0233991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:41.0234730Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:41.0332034Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:41.0409356Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0412016Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0414491Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0415885Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0417164Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0418601Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.0797708Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:41.0799109Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:41.0801071Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:41.0802414Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:41.0929154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:41.0940335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:41.0941676Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:41.0973365Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.1032492Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:41.1060501Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:41.4099797Z ok (2.932s) 2022-05-18T04:27:41.4232658Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39211 2022-05-18T04:27:41.4337026Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39212 2022-05-18T04:27:42.3449146Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9i_yx7_v 2022-05-18T04:27:42.3450316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9i_yx7_v/_remote_module_non_scriptable.py 2022-05-18T04:27:42.3643798Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1egmhlrm 2022-05-18T04:27:42.3645909Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1egmhlrm/_remote_module_non_scriptable.py 2022-05-18T04:27:42.3671413Z dist init r=0, world=2 2022-05-18T04:27:42.3676095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:42.3858794Z dist init r=1, world=2 2022-05-18T04:27:42.3862594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:42.3863940Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:42.3881078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:43.7266277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:43.7267667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:43.9270403Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:43.9271816Z warnings.warn( 2022-05-18T04:27:43.9313328Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:43.9314478Z warnings.warn( 2022-05-18T04:27:43.9339140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:43.9356057Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:43.9356873Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:43.9405558Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9407124Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9408392Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9409718Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9410989Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9412266Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9442897Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:43.9492117Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9494779Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9497500Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9500013Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9502599Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9505426Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9589179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:43.9597110Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:43.9598513Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:43.9692709Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:43.9761981Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9763829Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9765122Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9766438Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9767698Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:43.9768944Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:44.0456733Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:44.0458108Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:44.0459904Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:44.0461180Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:44.0587053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:44.0590369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:44.0591203Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:44.0619509Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:44.0690426Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:44.0719931Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:44.4415839Z ok (3.031s) 2022-05-18T04:27:44.4549483Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39316 2022-05-18T04:27:44.4654505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39317 2022-05-18T04:27:45.3600173Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpumopop5u 2022-05-18T04:27:45.3601215Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpumopop5u/_remote_module_non_scriptable.py 2022-05-18T04:27:45.3623535Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp248bqatp 2022-05-18T04:27:45.3626905Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp248bqatp/_remote_module_non_scriptable.py 2022-05-18T04:27:45.3816000Z dist init r=0, world=2 2022-05-18T04:27:45.3819847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:45.3864839Z dist init r=1, world=2 2022-05-18T04:27:45.3869617Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:45.3870778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:45.3923368Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:46.7309593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:46.7310139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:46.9355102Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:46.9356238Z warnings.warn( 2022-05-18T04:27:46.9358149Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:46.9359251Z warnings.warn( 2022-05-18T04:27:46.9398069Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:46.9399074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:46.9400434Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:46.9401832Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:46.9450519Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9453170Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9455945Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9458447Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9461017Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9463542Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9466576Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9469259Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9471735Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9474350Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9476288Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9477587Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9555726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:46.9563598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:46.9564309Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:46.9659352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:46.9822684Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9825672Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9828349Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9831098Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9834112Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:46.9837052Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:47.5156118Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:47.5157520Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:47.5159494Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:47.5160728Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:47.5294313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:47.5305718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:47.5306856Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:47.5332007Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:47.5333289Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:47.5397900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:47.5421080Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:47.5424073Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:48.3748017Z ok (3.933s) 2022-05-18T04:27:48.3881754Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39405 2022-05-18T04:27:48.3987432Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39406 2022-05-18T04:27:49.2925758Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0urt8gju 2022-05-18T04:27:49.2926929Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0urt8gju/_remote_module_non_scriptable.py 2022-05-18T04:27:49.3141614Z dist init r=1, world=2 2022-05-18T04:27:49.3146344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:49.3458238Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpee4xb5a_ 2022-05-18T04:27:49.3461245Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpee4xb5a_/_remote_module_non_scriptable.py 2022-05-18T04:27:49.3697224Z dist init r=0, world=2 2022-05-18T04:27:49.3701783Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:49.3702751Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:49.3758269Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:50.7174895Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:50.7176291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:50.9132214Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:50.9133555Z warnings.warn( 2022-05-18T04:27:50.9199609Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:50.9200679Z warnings.warn( 2022-05-18T04:27:50.9221799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:50.9246663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:50.9247781Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:50.9299598Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9301258Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9302542Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9304075Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9305365Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9306624Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9324778Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:50.9374514Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9375807Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9377212Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9378504Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9379772Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9381036Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9468436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:50.9469077Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:50.9469738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:50.9470578Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:50.9547391Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9548698Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9549982Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9551261Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9552531Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9553794Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:50.9936391Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:50.9937089Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:50.9940069Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:50.9940751Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:51.0076283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:51.0088731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:51.0089417Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:51.0122416Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:51.0179280Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:51.0207483Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:51.3063144Z ok (2.931s) 2022-05-18T04:27:51.3196351Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39494 2022-05-18T04:27:51.3301542Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39495 2022-05-18T04:27:52.2781819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp9pmqhnq 2022-05-18T04:27:52.2782715Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp9pmqhnq/_remote_module_non_scriptable.py 2022-05-18T04:27:52.2925548Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppjn4s6qk 2022-05-18T04:27:52.2928772Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppjn4s6qk/_remote_module_non_scriptable.py 2022-05-18T04:27:52.2999978Z dist init r=0, world=2 2022-05-18T04:27:52.3003963Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:52.3161347Z dist init r=1, world=2 2022-05-18T04:27:52.3165895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:52.3167002Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:52.3209664Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:53.6536701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:53.6537224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:53.8544670Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:53.8545718Z warnings.warn( 2022-05-18T04:27:53.8546525Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:53.8547068Z warnings.warn( 2022-05-18T04:27:53.8585333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:53.8587787Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:53.8588477Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:53.8635732Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8637036Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8638482Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8639754Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8640995Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8642256Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8688615Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:53.8736174Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8737469Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8738738Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8740141Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8741416Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8742679Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8823881Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:53.8832903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:53.8833585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:53.8926938Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:53.8994221Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8995509Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8997225Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.8999719Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.9002220Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.9004749Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.9618891Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:53.9619600Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:53.9623174Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:53.9624189Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:53.9747684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:53.9756535Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:53.9757990Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:53.9785617Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:53.9850852Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:53.9877783Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:54.3380689Z ok (3.032s) 2022-05-18T04:27:54.3513793Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39599 2022-05-18T04:27:54.3619922Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39600 2022-05-18T04:27:55.2930936Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1761f0ve 2022-05-18T04:27:55.2931699Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1761f0ve/_remote_module_non_scriptable.py 2022-05-18T04:27:55.2951810Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6vpg6cwc 2022-05-18T04:27:55.2954422Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6vpg6cwc/_remote_module_non_scriptable.py 2022-05-18T04:27:55.3147455Z dist init r=0, world=2 2022-05-18T04:27:55.3151538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:55.3169963Z dist init r=1, world=2 2022-05-18T04:27:55.3174123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:55.3174922Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:55.3255129Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:56.6339161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:56.6339668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:56.8284039Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:56.8284648Z warnings.warn( 2022-05-18T04:27:56.8350732Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:27:56.8351280Z warnings.warn( 2022-05-18T04:27:56.8376684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:27:56.8391668Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:27:56.8392358Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:56.8441604Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8442907Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8444189Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8445590Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8446854Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8448117Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8479834Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:27:56.8529338Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8530618Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8531878Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8533140Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8534505Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8535762Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8616597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:27:56.8626030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:27:56.8627203Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:56.8719445Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:27:56.8880245Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8881777Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8883057Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8884328Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8885603Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:56.8886875Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:57.4208218Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:57.4208942Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:57.4209914Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:27:57.4210751Z warnings.warn(msg, FutureWarning) 2022-05-18T04:27:57.4335583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:27:57.4341251Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:27:57.4341949Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:57.4363949Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:57.4365246Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:57.4438432Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:27:57.4459388Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:57.4460819Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:27:58.2716066Z ok (3.933s) 2022-05-18T04:27:58.2849672Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39688 2022-05-18T04:27:58.2954327Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39689 2022-05-18T04:27:59.2013717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnlhq595t 2022-05-18T04:27:59.2015267Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnlhq595t/_remote_module_non_scriptable.py 2022-05-18T04:27:59.2254742Z dist init r=0, world=2 2022-05-18T04:27:59.2259506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:27:59.2343073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1a8wcqk4 2022-05-18T04:27:59.2345794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1a8wcqk4/_remote_module_non_scriptable.py 2022-05-18T04:27:59.2561147Z dist init r=1, world=2 2022-05-18T04:27:59.2566120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:27:59.2567460Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:27:59.2568699Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:00.6164436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:00.6165402Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:00.8183400Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:00.8185556Z warnings.warn( 2022-05-18T04:28:00.8232567Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:00.8233678Z warnings.warn( 2022-05-18T04:28:00.8255921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:28:00.8281928Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:28:00.8282660Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:28:00.8336099Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8337406Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8338880Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8340149Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8341461Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8342722Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8359744Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:28:00.8410840Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8413532Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8416023Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8418799Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8421363Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8424153Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8506330Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:28:00.8511862Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:28:00.8513304Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:28:00.8610283Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:28:00.8686881Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8689389Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8690946Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8692234Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8693503Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.8694765Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.9083001Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:00.9084591Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:00.9086410Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:00.9087676Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:00.9222715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:28:00.9234514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:28:00.9235374Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:28:00.9268461Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:00.9326409Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:28:00.9356281Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:28:01.3034662Z ok (3.032s) 2022-05-18T04:28:01.3168245Z test_nested_all_wrapped_model_offload_false_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39777 2022-05-18T04:28:01.3274032Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39778 2022-05-18T04:28:02.2611470Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsl66py2u 2022-05-18T04:28:02.2612483Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsl66py2u/_remote_module_non_scriptable.py 2022-05-18T04:28:02.2828162Z dist init r=0, world=2 2022-05-18T04:28:02.2832354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:02.2927058Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa4zx8eqj 2022-05-18T04:28:02.2929962Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa4zx8eqj/_remote_module_non_scriptable.py 2022-05-18T04:28:02.3144285Z dist init r=1, world=2 2022-05-18T04:28:02.3148944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:02.3149790Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:02.3241103Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:03.6498255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:03.6498785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:03.8514078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:03.8521753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:03.8542153Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:03.8542747Z warnings.warn( 2022-05-18T04:28:03.8552467Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:03.8553047Z warnings.warn( 2022-05-18T04:28:03.8895097Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:03.8895784Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:03.8897510Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:03.8898184Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:03.8974716Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:03.8975213Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:04.2352007Z ok (2.932s) 2022-05-18T04:28:04.2485045Z test_nested_all_wrapped_model_offload_false_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39860 2022-05-18T04:28:04.2592944Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39861 2022-05-18T04:28:05.1462227Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4odajw4n 2022-05-18T04:28:05.1463405Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4odajw4n/_remote_module_non_scriptable.py 2022-05-18T04:28:05.1483694Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpms89rxan 2022-05-18T04:28:05.1486808Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpms89rxan/_remote_module_non_scriptable.py 2022-05-18T04:28:05.1683100Z dist init r=0, world=2 2022-05-18T04:28:05.1687408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:05.1716340Z dist init r=1, world=2 2022-05-18T04:28:05.1720854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:05.1721767Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:05.1791072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:06.5244068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:06.5244636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:06.7246306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:06.7246849Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:06.7275362Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:06.7276034Z warnings.warn( 2022-05-18T04:28:06.7276809Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:06.7277326Z warnings.warn( 2022-05-18T04:28:06.7627562Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:06.7628323Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:06.7638907Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:06.7639584Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:06.7718508Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:06.7719025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:07.0667165Z ok (2.831s) 2022-05-18T04:28:07.0800697Z test_nested_all_wrapped_model_offload_false_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39943 2022-05-18T04:28:07.0906698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39944 2022-05-18T04:28:08.0455539Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwug20er2 2022-05-18T04:28:08.0456612Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwug20er2/_remote_module_non_scriptable.py 2022-05-18T04:28:08.0561031Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmporshjvf5 2022-05-18T04:28:08.0563720Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmporshjvf5/_remote_module_non_scriptable.py 2022-05-18T04:28:08.0675850Z dist init r=1, world=2 2022-05-18T04:28:08.0680212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:08.0786159Z dist init r=0, world=2 2022-05-18T04:28:08.0790437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:08.0791884Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:08.0885679Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:09.4190332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:09.4190886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:09.6196293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:09.6196906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:09.6224923Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:09.6225547Z warnings.warn( 2022-05-18T04:28:09.6226323Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:09.6226864Z warnings.warn( 2022-05-18T04:28:09.6710600Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:09.6711295Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:09.6713638Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:09.6714361Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:09.6791208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:09.6792210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:09.9984035Z ok (2.932s) 2022-05-18T04:28:10.0118770Z test_nested_all_wrapped_model_offload_false_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40026 2022-05-18T04:28:10.0225509Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40027 2022-05-18T04:28:10.9544784Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp03fg4j_v 2022-05-18T04:28:10.9545906Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp03fg4j_v/_remote_module_non_scriptable.py 2022-05-18T04:28:10.9686502Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpscwldcr_ 2022-05-18T04:28:10.9687459Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpscwldcr_/_remote_module_non_scriptable.py 2022-05-18T04:28:10.9767345Z dist init r=0, world=2 2022-05-18T04:28:10.9771558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:10.9911255Z dist init r=1, world=2 2022-05-18T04:28:10.9915920Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:10.9917182Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:10.9977418Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:12.3519074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:12.3519584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:12.5516624Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:12.5517150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:12.5544342Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:12.5544921Z warnings.warn( 2022-05-18T04:28:12.5545677Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:12.5546223Z warnings.warn( 2022-05-18T04:28:12.6038349Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:12.6039050Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:12.6041579Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:12.6042244Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:12.6120136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:12.6121360Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:12.9303491Z ok (2.932s) 2022-05-18T04:28:12.9439653Z test_nested_all_wrapped_model_offload_false_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40109 2022-05-18T04:28:12.9548454Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40110 2022-05-18T04:28:13.8528143Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxbuf_e7x 2022-05-18T04:28:13.8529174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxbuf_e7x/_remote_module_non_scriptable.py 2022-05-18T04:28:13.8754240Z dist init r=0, world=2 2022-05-18T04:28:13.8758723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:13.9014475Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn9ld4afz 2022-05-18T04:28:13.9017358Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn9ld4afz/_remote_module_non_scriptable.py 2022-05-18T04:28:13.9239079Z dist init r=1, world=2 2022-05-18T04:28:13.9243762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:13.9244550Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:13.9269225Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:15.2788247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:15.2788794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:15.4838573Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:15.4839085Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:15.4866716Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:15.4867301Z warnings.warn( 2022-05-18T04:28:15.4868069Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:15.4868606Z warnings.warn( 2022-05-18T04:28:15.5329411Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:15.5330107Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:15.5331311Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:15.5331975Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:15.5404986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:15.5405738Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:15.8627930Z ok (2.932s) 2022-05-18T04:28:15.8762231Z test_nested_all_wrapped_model_offload_false_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40192 2022-05-18T04:28:15.8869564Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40193 2022-05-18T04:28:16.7764854Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkb1e6zfg 2022-05-18T04:28:16.7765950Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkb1e6zfg/_remote_module_non_scriptable.py 2022-05-18T04:28:16.7990270Z dist init r=1, world=2 2022-05-18T04:28:16.7994753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:16.8268498Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyh0kv4it 2022-05-18T04:28:16.8271156Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyh0kv4it/_remote_module_non_scriptable.py 2022-05-18T04:28:16.8491302Z dist init r=0, world=2 2022-05-18T04:28:16.8495634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:16.8496614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:16.8504715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:18.2047996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:18.2048532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:18.4057430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:18.4058065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:18.4087765Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:18.4088394Z warnings.warn( 2022-05-18T04:28:18.4089177Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:18.4089712Z warnings.warn( 2022-05-18T04:28:18.4575081Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:18.4575772Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:18.4577654Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:18.4578327Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:18.4655631Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:18.4656739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:18.7946765Z ok (2.932s) 2022-05-18T04:28:18.8080377Z test_nested_all_wrapped_model_offload_false_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40275 2022-05-18T04:28:18.8184982Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40276 2022-05-18T04:28:19.7940453Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgdc1935e 2022-05-18T04:28:19.7941650Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgdc1935e/_remote_module_non_scriptable.py 2022-05-18T04:28:19.8135018Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpinbxtx3g 2022-05-18T04:28:19.8138010Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpinbxtx3g/_remote_module_non_scriptable.py 2022-05-18T04:28:19.8154652Z dist init r=0, world=2 2022-05-18T04:28:19.8158998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:19.8354341Z dist init r=1, world=2 2022-05-18T04:28:19.8358710Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:19.8360242Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:19.8364240Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:21.1697795Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:21.1698512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:21.3709026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:21.3716583Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:21.3737240Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:21.3738133Z warnings.warn( 2022-05-18T04:28:21.3745180Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:21.3745732Z warnings.warn( 2022-05-18T04:28:21.4089197Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:21.4089906Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:21.4095372Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:21.4096056Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:21.4170575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:21.4171077Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:21.7261304Z ok (2.931s) 2022-05-18T04:28:21.7396141Z test_nested_all_wrapped_model_offload_false_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40358 2022-05-18T04:28:21.7501016Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40359 2022-05-18T04:28:22.6451019Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbaek2_5n 2022-05-18T04:28:22.6452211Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbaek2_5n/_remote_module_non_scriptable.py 2022-05-18T04:28:22.6477682Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl054_imi 2022-05-18T04:28:22.6480626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl054_imi/_remote_module_non_scriptable.py 2022-05-18T04:28:22.6666588Z dist init r=1, world=2 2022-05-18T04:28:22.6670684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:22.6703561Z dist init r=0, world=2 2022-05-18T04:28:22.6708300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:22.6709476Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:22.6774237Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:24.0463534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:24.0464267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:24.2493190Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:24.2493725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:24.2521397Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:24.2521979Z warnings.warn( 2022-05-18T04:28:24.2522737Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:24.2523572Z warnings.warn( 2022-05-18T04:28:24.2871431Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:24.2872186Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:24.2873134Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:24.2873780Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:24.2948920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:24.2949427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:24.6579010Z ok (2.932s) 2022-05-18T04:28:24.6712442Z test_nested_all_wrapped_model_offload_false_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40441 2022-05-18T04:28:24.6818362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40442 2022-05-18T04:28:25.6143199Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp40shdown 2022-05-18T04:28:25.6144224Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp40shdown/_remote_module_non_scriptable.py 2022-05-18T04:28:25.6260073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp614na5up 2022-05-18T04:28:25.6263047Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp614na5up/_remote_module_non_scriptable.py 2022-05-18T04:28:25.6360764Z dist init r=1, world=2 2022-05-18T04:28:25.6364862Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:25.6482813Z dist init r=0, world=2 2022-05-18T04:28:25.6487222Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:25.6488009Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:25.6569988Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:27.0109459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:27.0109996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:27.2094850Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:27.2095392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:27.2123881Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:27.2124462Z warnings.warn( 2022-05-18T04:28:27.2125215Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:27.2125766Z warnings.warn( 2022-05-18T04:28:27.2618009Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:27.2618692Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:27.2620524Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:27.2621248Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:27.2696751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:27.2697289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:27.5895978Z ok (2.931s) 2022-05-18T04:28:27.6029788Z test_nested_all_wrapped_model_offload_false_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40524 2022-05-18T04:28:27.6137222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40525 2022-05-18T04:28:28.4966738Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdkqpvadk 2022-05-18T04:28:28.4968211Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdkqpvadk/_remote_module_non_scriptable.py 2022-05-18T04:28:28.5138076Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgy6ucqwu 2022-05-18T04:28:28.5141145Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgy6ucqwu/_remote_module_non_scriptable.py 2022-05-18T04:28:28.5191626Z dist init r=1, world=2 2022-05-18T04:28:28.5196030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:28.5354623Z dist init r=0, world=2 2022-05-18T04:28:28.5358780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:28.5359573Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:28.5401774Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:29.8790732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:29.8791260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:30.0773981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:30.0782176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:30.0803236Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:30.0803839Z warnings.warn( 2022-05-18T04:28:30.0811754Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:30.0812302Z warnings.warn( 2022-05-18T04:28:30.1312039Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:30.1312727Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:30.1315587Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:30.1316260Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:30.1392068Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:30.1393184Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:30.5212738Z ok (2.932s) 2022-05-18T04:28:30.5348563Z test_nested_all_wrapped_model_offload_false_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40607 2022-05-18T04:28:30.5453462Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40608 2022-05-18T04:28:31.4375807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi3nlzd6e 2022-05-18T04:28:31.4376983Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi3nlzd6e/_remote_module_non_scriptable.py 2022-05-18T04:28:31.4601038Z dist init r=1, world=2 2022-05-18T04:28:31.4605527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:31.4775874Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe8qz7whj 2022-05-18T04:28:31.4778604Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe8qz7whj/_remote_module_non_scriptable.py 2022-05-18T04:28:31.4992585Z dist init r=0, world=2 2022-05-18T04:28:31.4996854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:31.4997987Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:31.5014124Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:32.8376762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:32.8377713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:33.0347988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:33.0348525Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:33.0375603Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:33.0376160Z warnings.warn( 2022-05-18T04:28:33.0377226Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:33.0377796Z warnings.warn( 2022-05-18T04:28:33.0860516Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:33.0861211Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:33.0862512Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:33.0863172Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:33.0939041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:33.0939545Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:33.4529803Z ok (2.932s) 2022-05-18T04:28:33.4664030Z test_nested_all_wrapped_model_offload_false_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40690 2022-05-18T04:28:33.4768923Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40691 2022-05-18T04:28:34.4108954Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_qvj7t6 2022-05-18T04:28:34.4110018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_qvj7t6/_remote_module_non_scriptable.py 2022-05-18T04:28:34.4127852Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjlc437c 2022-05-18T04:28:34.4130729Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjlc437c/_remote_module_non_scriptable.py 2022-05-18T04:28:34.4324254Z dist init r=0, world=2 2022-05-18T04:28:34.4328568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:34.4345138Z dist init r=1, world=2 2022-05-18T04:28:34.4349564Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:34.4350414Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:34.4431855Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:35.7631542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:35.7632069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:35.9623504Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:35.9629725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:35.9652142Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:35.9652730Z warnings.warn( 2022-05-18T04:28:35.9658829Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:35.9659377Z warnings.warn( 2022-05-18T04:28:36.0125466Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:36.0126398Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:36.0127347Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:36.0127981Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:36.0202202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:36.0202703Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:36.3846170Z ok (2.931s) 2022-05-18T04:28:36.3978547Z test_nested_all_wrapped_model_offload_false_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40773 2022-05-18T04:28:36.4084667Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40774 2022-05-18T04:28:37.3010396Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvn4orbqy 2022-05-18T04:28:37.3011567Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvn4orbqy/_remote_module_non_scriptable.py 2022-05-18T04:28:37.3233833Z dist init r=0, world=2 2022-05-18T04:28:37.3238394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:37.3486251Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_x6ftmku 2022-05-18T04:28:37.3488778Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_x6ftmku/_remote_module_non_scriptable.py 2022-05-18T04:28:37.3703225Z dist init r=1, world=2 2022-05-18T04:28:37.3707572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:37.3708389Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:37.3748405Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:38.7252780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:38.7253839Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:38.9295919Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:38.9323710Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:38.9325691Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:38.9326793Z warnings.warn( 2022-05-18T04:28:38.9328279Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:38.9329379Z warnings.warn( 2022-05-18T04:28:38.9677875Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:38.9679216Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:38.9690168Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:38.9691611Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:38.9770211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:38.9771158Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:39.3162197Z ok (2.931s) 2022-05-18T04:28:39.3295036Z test_nested_all_wrapped_model_offload_false_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40856 2022-05-18T04:28:39.3403541Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40857 2022-05-18T04:28:40.2418781Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqufwu75p 2022-05-18T04:28:40.2419969Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqufwu75p/_remote_module_non_scriptable.py 2022-05-18T04:28:40.2433946Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptlqrfbwn 2022-05-18T04:28:40.2436311Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptlqrfbwn/_remote_module_non_scriptable.py 2022-05-18T04:28:40.2647702Z dist init r=0, world=2 2022-05-18T04:28:40.2652482Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:40.2652905Z dist init r=1, world=2 2022-05-18T04:28:40.2656999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:40.2658076Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:40.2755913Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:41.6394078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:41.6395088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:41.8380222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:41.8381167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:41.8408996Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:41.8410134Z warnings.warn( 2022-05-18T04:28:41.8411622Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:41.8412648Z warnings.warn( 2022-05-18T04:28:41.8762577Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:41.8763892Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:41.8772297Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:41.8773696Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:41.8852740Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:41.8854150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:42.2480489Z ok (2.932s) 2022-05-18T04:28:42.2611081Z test_nested_all_wrapped_model_offload_false_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40939 2022-05-18T04:28:42.2715622Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40940 2022-05-18T04:28:43.2066196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq2vaobqp 2022-05-18T04:28:43.2067430Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq2vaobqp/_remote_module_non_scriptable.py 2022-05-18T04:28:43.2072092Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6ozf27et 2022-05-18T04:28:43.2074888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6ozf27et/_remote_module_non_scriptable.py 2022-05-18T04:28:43.2290488Z dist init r=1, world=2 2022-05-18T04:28:43.2291436Z dist init r=0, world=2 2022-05-18T04:28:43.2296347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:43.2297286Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:43.2298662Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:43.2300179Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:44.5886742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:44.5887739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:44.8177397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:44.8179166Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:44.8205280Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:44.8205864Z warnings.warn( 2022-05-18T04:28:44.8209240Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:44.8209815Z warnings.warn( 2022-05-18T04:28:44.8704807Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:44.8705649Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:44.8707174Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:44.8707844Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:44.8784440Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:44.8785245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:45.2795000Z ok (3.031s) 2022-05-18T04:28:45.2927987Z test_nested_all_wrapped_model_offload_false_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41022 2022-05-18T04:28:45.3033000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41023 2022-05-18T04:28:46.1959819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl1p1epor 2022-05-18T04:28:46.1960696Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl1p1epor/_remote_module_non_scriptable.py 2022-05-18T04:28:46.1985698Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdzushuj_ 2022-05-18T04:28:46.1989131Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdzushuj_/_remote_module_non_scriptable.py 2022-05-18T04:28:46.2173707Z dist init r=0, world=2 2022-05-18T04:28:46.2177837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:46.2211258Z dist init r=1, world=2 2022-05-18T04:28:46.2216087Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:46.2217268Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:46.2281977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:47.5568662Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:47.5569243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:47.7533319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:47.7533869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:47.7561169Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:47.7561726Z warnings.warn( 2022-05-18T04:28:47.7562510Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:47.7563050Z warnings.warn( 2022-05-18T04:28:47.8051706Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:47.8052394Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:47.8053333Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:47.8053988Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:47.8127548Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:47.8128060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:48.2110739Z ok (2.931s) 2022-05-18T04:28:48.2244996Z test_nested_all_wrapped_model_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41105 2022-05-18T04:28:48.2351209Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41106 2022-05-18T04:28:49.1443540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp94lvywmo 2022-05-18T04:28:49.1444508Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp94lvywmo/_remote_module_non_scriptable.py 2022-05-18T04:28:49.1666819Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfsjwrdal 2022-05-18T04:28:49.1667458Z dist init r=1, world=2 2022-05-18T04:28:49.1670001Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfsjwrdal/_remote_module_non_scriptable.py 2022-05-18T04:28:49.1671856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:49.1888956Z dist init r=0, world=2 2022-05-18T04:28:49.1893135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:49.1894046Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:49.1979049Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:50.5304149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:50.5304888Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:50.7375476Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:50.7383787Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:50.7403352Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:50.7404286Z warnings.warn( 2022-05-18T04:28:50.7414326Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:50.7414881Z warnings.warn( 2022-05-18T04:28:50.7901021Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:50.7901696Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:50.7902863Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:50.7903509Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:50.7981043Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:50.7981988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:51.1428307Z ok (2.932s) 2022-05-18T04:28:51.1559668Z test_nested_all_wrapped_model_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41188 2022-05-18T04:28:51.1666578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41189 2022-05-18T04:28:52.0635357Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdim0gjio 2022-05-18T04:28:52.0636299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdim0gjio/_remote_module_non_scriptable.py 2022-05-18T04:28:52.0728714Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgazhyrq_ 2022-05-18T04:28:52.0731500Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgazhyrq_/_remote_module_non_scriptable.py 2022-05-18T04:28:52.0851043Z dist init r=1, world=2 2022-05-18T04:28:52.0855070Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:52.0945727Z dist init r=0, world=2 2022-05-18T04:28:52.0950230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:52.0951052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:52.0958213Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:53.4346274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:53.4346816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:53.6388174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:53.6389145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:53.6416829Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:53.6417931Z warnings.warn( 2022-05-18T04:28:53.6419405Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:53.6420823Z warnings.warn( 2022-05-18T04:28:53.6891791Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:53.6893203Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:53.6895160Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:53.6896519Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:53.6971851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:53.6972822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:54.0750953Z ok (2.932s) 2022-05-18T04:28:54.0885231Z test_nested_all_wrapped_model_offload_true_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41271 2022-05-18T04:28:54.0990653Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41272 2022-05-18T04:28:54.9947998Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6cpq65ef 2022-05-18T04:28:54.9948812Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6cpq65ef/_remote_module_non_scriptable.py 2022-05-18T04:28:54.9961681Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg_zvt_vv 2022-05-18T04:28:54.9964786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg_zvt_vv/_remote_module_non_scriptable.py 2022-05-18T04:28:55.0164257Z dist init r=0, world=2 2022-05-18T04:28:55.0168453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:55.0188888Z dist init r=1, world=2 2022-05-18T04:28:55.0193327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:55.0194654Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:55.0271893Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:56.3641336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:56.3641890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:56.5626021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:56.5633752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:56.5654517Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:56.5655093Z warnings.warn( 2022-05-18T04:28:56.5662040Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:56.5662604Z warnings.warn( 2022-05-18T04:28:56.5762507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:56.5763009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:56.6227860Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:56.6228804Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:56.6229749Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:56.6230381Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:56.6302426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:56.6302927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:56.6622767Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:28:56.6623845Z return iter(self.unbind(0)) 2022-05-18T04:28:56.6624989Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:28:56.6625772Z return iter(self.unbind(0)) 2022-05-18T04:28:57.0066139Z ok (2.931s) 2022-05-18T04:28:57.0199872Z test_nested_all_wrapped_model_offload_true_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41354 2022-05-18T04:28:57.0306562Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41355 2022-05-18T04:28:57.9452237Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpstlcpqlk 2022-05-18T04:28:57.9453449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpstlcpqlk/_remote_module_non_scriptable.py 2022-05-18T04:28:57.9676205Z dist init r=1, world=2 2022-05-18T04:28:57.9680694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:28:57.9839254Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_kk72fd 2022-05-18T04:28:57.9841927Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_kk72fd/_remote_module_non_scriptable.py 2022-05-18T04:28:58.0054785Z dist init r=0, world=2 2022-05-18T04:28:58.0058892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:28:58.0059922Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:58.0089113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:28:59.3492010Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:59.3492582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:59.5451266Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:59.5451814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:59.5479153Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:59.5480039Z warnings.warn( 2022-05-18T04:28:59.5480807Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:28:59.5481341Z warnings.warn( 2022-05-18T04:28:59.5578930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:59.5579444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:59.6044779Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:59.6045476Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:59.6046581Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:28:59.6047233Z warnings.warn(msg, FutureWarning) 2022-05-18T04:28:59.6119126Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:59.6119636Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:28:59.6443628Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:28:59.6444431Z return iter(self.unbind(0)) 2022-05-18T04:28:59.6445761Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:28:59.6446543Z return iter(self.unbind(0)) 2022-05-18T04:28:59.9384302Z ok (2.932s) 2022-05-18T04:28:59.9517931Z test_nested_all_wrapped_model_offload_true_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41437 2022-05-18T04:28:59.9623420Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41438 2022-05-18T04:29:00.8564002Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb79ebmkn 2022-05-18T04:29:00.8565134Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb79ebmkn/_remote_module_non_scriptable.py 2022-05-18T04:29:00.8780348Z dist init r=1, world=2 2022-05-18T04:29:00.8785060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:00.9049510Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc7c00920 2022-05-18T04:29:00.9052772Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc7c00920/_remote_module_non_scriptable.py 2022-05-18T04:29:00.9275536Z dist init r=0, world=2 2022-05-18T04:29:00.9281113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:00.9282534Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:00.9295644Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:02.2858990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:02.2859951Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:02.4904833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:02.4905856Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:02.4934843Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:02.4935971Z warnings.warn( 2022-05-18T04:29:02.4937447Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:02.4938532Z warnings.warn( 2022-05-18T04:29:02.5047815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:02.5048792Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:02.5653212Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:02.5654584Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:02.5657214Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:02.5658634Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:02.5738551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:02.5739531Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:02.6010208Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:02.6012866Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:02.6015377Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:02.6017942Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:02.6020494Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:02.6023283Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:02.9702079Z ok (3.032s) 2022-05-18T04:29:02.9836143Z test_nested_all_wrapped_model_offload_true_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41520 2022-05-18T04:29:02.9944974Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41521 2022-05-18T04:29:03.8863549Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl5znvte6 2022-05-18T04:29:03.8864922Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl5znvte6/_remote_module_non_scriptable.py 2022-05-18T04:29:03.9080431Z dist init r=0, world=2 2022-05-18T04:29:03.9084580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:03.9288671Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6m5h0ssp 2022-05-18T04:29:03.9291403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6m5h0ssp/_remote_module_non_scriptable.py 2022-05-18T04:29:03.9505238Z dist init r=1, world=2 2022-05-18T04:29:03.9509523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:03.9510322Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:03.9594395Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:05.2862396Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:05.2862945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:05.4883138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:05.4890685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:05.4911217Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:05.4911793Z warnings.warn( 2022-05-18T04:29:05.4920277Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:05.4920831Z warnings.warn( 2022-05-18T04:29:05.5025812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:05.5026300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:05.5610979Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:05.5611652Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:05.5612584Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:05.5613385Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:05.5686822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:05.5687654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:05.5950912Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:05.5952264Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:05.5953542Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:05.5954807Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:05.5956077Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:05.5957328Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:05.9021328Z ok (2.932s) 2022-05-18T04:29:05.9155896Z test_nested_all_wrapped_model_offload_true_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41603 2022-05-18T04:29:05.9261154Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41604 2022-05-18T04:29:06.8260291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeuhpbfh1 2022-05-18T04:29:06.8261502Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeuhpbfh1/_remote_module_non_scriptable.py 2022-05-18T04:29:06.8482991Z dist init r=1, world=2 2022-05-18T04:29:06.8487038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:06.8544626Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzx48na8s 2022-05-18T04:29:06.8547406Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzx48na8s/_remote_module_non_scriptable.py 2022-05-18T04:29:06.8761080Z dist init r=0, world=2 2022-05-18T04:29:06.8764921Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:06.8765778Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:06.8793695Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:08.2220621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:08.2221143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:08.4267334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:08.4268073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:08.4295387Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:08.4296232Z warnings.warn( 2022-05-18T04:29:08.4296989Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:08.4297539Z warnings.warn( 2022-05-18T04:29:08.4401624Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:08.4402128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:08.4973153Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:08.4974127Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:08.4975087Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:08.4975753Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:08.5048246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:08.5048756Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:08.5305067Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:08.5306571Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:08.5307869Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:08.5309177Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:08.5310443Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:08.5311830Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:08.8339411Z ok (2.932s) 2022-05-18T04:29:08.8471060Z test_nested_all_wrapped_model_offload_true_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41686 2022-05-18T04:29:08.8575983Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41687 2022-05-18T04:29:09.7594548Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3b_m3_vh 2022-05-18T04:29:09.7595658Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3b_m3_vh/_remote_module_non_scriptable.py 2022-05-18T04:29:09.7809400Z dist init r=0, world=2 2022-05-18T04:29:09.7813503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:09.8117161Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpftdd43uu 2022-05-18T04:29:09.8120106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpftdd43uu/_remote_module_non_scriptable.py 2022-05-18T04:29:09.8344161Z dist init r=1, world=2 2022-05-18T04:29:09.8348653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:09.8349877Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:09.8425234Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:11.1737524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:11.1738061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:11.3790063Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:11.3797837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:11.3819275Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:11.3820272Z warnings.warn( 2022-05-18T04:29:11.3827456Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:11.3828024Z warnings.warn( 2022-05-18T04:29:11.3936874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:11.3937400Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:11.4522601Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:11.4523259Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:11.4526173Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:11.4526845Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:11.4603293Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:11.4603774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:11.4863296Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:11.4864841Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:11.4866116Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:11.4867394Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:11.4868660Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:11.4869919Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:11.8655198Z ok (3.031s) 2022-05-18T04:29:11.8792155Z test_nested_all_wrapped_model_offload_true_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41769 2022-05-18T04:29:11.8899453Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41770 2022-05-18T04:29:12.8339208Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4_52abj8 2022-05-18T04:29:12.8340100Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4_52abj8/_remote_module_non_scriptable.py 2022-05-18T04:29:12.8456943Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx5whymeh 2022-05-18T04:29:12.8459682Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx5whymeh/_remote_module_non_scriptable.py 2022-05-18T04:29:12.8558101Z dist init r=1, world=2 2022-05-18T04:29:12.8562252Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:12.8682351Z dist init r=0, world=2 2022-05-18T04:29:12.8686679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:12.8687572Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:12.8767551Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:14.2226253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:14.2226803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:14.4280202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:14.4280737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:14.4308193Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:14.4308768Z warnings.warn( 2022-05-18T04:29:14.4309554Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:14.4310094Z warnings.warn( 2022-05-18T04:29:14.4408358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:14.4408880Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:14.4877435Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:14.4878130Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:14.4879076Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:14.4879722Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:14.4951924Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:14.4952437Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:14.5273174Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:14.5273961Z return iter(self.unbind(0)) 2022-05-18T04:29:14.5275304Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:14.5276077Z return iter(self.unbind(0)) 2022-05-18T04:29:14.7976422Z ok (2.932s) 2022-05-18T04:29:14.8110662Z test_nested_all_wrapped_model_offload_true_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41852 2022-05-18T04:29:14.8217792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41853 2022-05-18T04:29:15.6961381Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfx4jsn82 2022-05-18T04:29:15.6962248Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfx4jsn82/_remote_module_non_scriptable.py 2022-05-18T04:29:15.7038676Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgcb2jp5m 2022-05-18T04:29:15.7041725Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgcb2jp5m/_remote_module_non_scriptable.py 2022-05-18T04:29:15.7177008Z dist init r=1, world=2 2022-05-18T04:29:15.7181362Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:15.7263714Z dist init r=0, world=2 2022-05-18T04:29:15.7268300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:15.7269376Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:15.7284274Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:17.0652369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:17.0652877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:17.2629772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:17.2630346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:17.2658169Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:17.2658744Z warnings.warn( 2022-05-18T04:29:17.2659504Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:17.2660043Z warnings.warn( 2022-05-18T04:29:17.2759599Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:17.2760104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:17.3233020Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:17.3233708Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:17.3234924Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:17.3235604Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:17.3308617Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:17.3309099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:17.3636037Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:17.3636831Z return iter(self.unbind(0)) 2022-05-18T04:29:17.3637972Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:17.3638744Z return iter(self.unbind(0)) 2022-05-18T04:29:17.7293694Z ok (2.932s) 2022-05-18T04:29:17.7425376Z test_nested_all_wrapped_model_offload_true_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41935 2022-05-18T04:29:17.7530761Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41936 2022-05-18T04:29:18.6425490Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn3v4eijo 2022-05-18T04:29:18.6426967Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn3v4eijo/_remote_module_non_scriptable.py 2022-05-18T04:29:18.6649162Z dist init r=1, world=2 2022-05-18T04:29:18.6653712Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:18.6903438Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1trexabz 2022-05-18T04:29:18.6906263Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1trexabz/_remote_module_non_scriptable.py 2022-05-18T04:29:18.7120722Z dist init r=0, world=2 2022-05-18T04:29:18.7125058Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:18.7126072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:18.7163784Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:20.0599964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:20.0600534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:20.2635279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:20.2643257Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:20.2664001Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:20.2664878Z warnings.warn( 2022-05-18T04:29:20.2673645Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:20.2674194Z warnings.warn( 2022-05-18T04:29:20.2782328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:20.2782862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:20.3390454Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:20.3391140Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:20.3394168Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:20.3394838Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:20.3469953Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:20.3471174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:20.3742241Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:20.3744365Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:20.3745658Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:20.3746937Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:20.3748206Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:20.3749476Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:20.7608834Z ok (3.031s) 2022-05-18T04:29:20.7742566Z test_nested_all_wrapped_model_offload_true_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42018 2022-05-18T04:29:20.7848095Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42019 2022-05-18T04:29:21.6783856Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplzqzzrfx 2022-05-18T04:29:21.6785074Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplzqzzrfx/_remote_module_non_scriptable.py 2022-05-18T04:29:21.7000482Z dist init r=0, world=2 2022-05-18T04:29:21.7005107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:21.7312278Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp1nm5wgt 2022-05-18T04:29:21.7314976Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp1nm5wgt/_remote_module_non_scriptable.py 2022-05-18T04:29:21.7545595Z dist init r=1, world=2 2022-05-18T04:29:21.7550413Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:21.7551620Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:21.7616054Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:23.0889471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:23.0890009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:23.2905637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:23.2906201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:23.2933745Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:23.2934640Z warnings.warn( 2022-05-18T04:29:23.2935416Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:23.2935953Z warnings.warn( 2022-05-18T04:29:23.3044887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:23.3045399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:23.3649982Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:23.3650670Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:23.3652220Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:23.3652862Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:23.3730514Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:23.3731020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:23.4002207Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:23.4003501Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:23.4004939Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:23.4006232Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:23.4007503Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:23.4008800Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:23.6924163Z ok (2.931s) 2022-05-18T04:29:23.7064991Z test_nested_all_wrapped_model_offload_true_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42101 2022-05-18T04:29:23.7170592Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42102 2022-05-18T04:29:24.6091987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpchmvwa46 2022-05-18T04:29:24.6092823Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpchmvwa46/_remote_module_non_scriptable.py 2022-05-18T04:29:24.6316767Z dist init r=0, world=2 2022-05-18T04:29:24.6320843Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:24.6419770Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2r81_2cv 2022-05-18T04:29:24.6422770Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2r81_2cv/_remote_module_non_scriptable.py 2022-05-18T04:29:24.6636829Z dist init r=1, world=2 2022-05-18T04:29:24.6641359Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:24.6642180Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:24.6729791Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:26.0331196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:26.0331706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:26.2325678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:26.2326237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:26.2354229Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:26.2354791Z warnings.warn( 2022-05-18T04:29:26.2355570Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:26.2356105Z warnings.warn( 2022-05-18T04:29:26.2464799Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:26.2465282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:26.3055775Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:26.3056492Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:26.3059329Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:26.3060007Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:26.3138253Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:26.3138763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:26.3400123Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:26.3401674Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:26.3403121Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:26.3404387Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:26.3405637Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:26.3406902Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:26.7249999Z ok (3.032s) 2022-05-18T04:29:26.7384052Z test_nested_all_wrapped_model_offload_true_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42184 2022-05-18T04:29:26.7493710Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42185 2022-05-18T04:29:27.6370167Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgbavo9wt 2022-05-18T04:29:27.6372027Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgbavo9wt/_remote_module_non_scriptable.py 2022-05-18T04:29:27.6595669Z dist init r=1, world=2 2022-05-18T04:29:27.6600082Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:27.6775134Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpohlwmk7m 2022-05-18T04:29:27.6778009Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpohlwmk7m/_remote_module_non_scriptable.py 2022-05-18T04:29:27.6992255Z dist init r=0, world=2 2022-05-18T04:29:27.6996534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:27.6997983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:27.7008895Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:29.0441945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:29.0442485Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:29.2399210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:29.2399742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:29.2427060Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:29.2427631Z warnings.warn( 2022-05-18T04:29:29.2428387Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:29.2429257Z warnings.warn( 2022-05-18T04:29:29.2533027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:29.2533584Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:29.3107974Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:29.3108661Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:29.3109594Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:29.3110246Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:29.3182165Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:29.3182648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:29.3439880Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:29.3441572Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:29.3442850Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:29.3444304Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:29.3445585Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:29.3446841Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:29.6569552Z ok (2.932s) 2022-05-18T04:29:29.6703857Z test_nested_all_wrapped_model_offload_true_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42267 2022-05-18T04:29:29.6809846Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42268 2022-05-18T04:29:30.5801968Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp648zml49 2022-05-18T04:29:30.5803411Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp648zml49/_remote_module_non_scriptable.py 2022-05-18T04:29:30.5833050Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7p906a95 2022-05-18T04:29:30.5835455Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7p906a95/_remote_module_non_scriptable.py 2022-05-18T04:29:30.6019217Z dist init r=0, world=2 2022-05-18T04:29:30.6023545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:30.6057137Z dist init r=1, world=2 2022-05-18T04:29:30.6061735Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:30.6062672Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:30.6127205Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:31.9471523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:31.9472490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:32.1482953Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:32.1489841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:32.1511789Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:32.1512874Z warnings.warn( 2022-05-18T04:29:32.1520831Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:32.1521983Z warnings.warn( 2022-05-18T04:29:32.1622664Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:32.1623927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:32.2100817Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:32.2102238Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:32.2107061Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:32.2108414Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:32.2181759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:32.2182735Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:32.2516348Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:32.2517135Z return iter(self.unbind(0)) 2022-05-18T04:29:32.2518789Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:32.2519765Z return iter(self.unbind(0)) 2022-05-18T04:29:32.5884906Z ok (2.931s) 2022-05-18T04:29:32.6018336Z test_nested_all_wrapped_model_offload_true_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42350 2022-05-18T04:29:32.6122750Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42351 2022-05-18T04:29:33.5170717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpksuvs2ww 2022-05-18T04:29:33.5171552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpksuvs2ww/_remote_module_non_scriptable.py 2022-05-18T04:29:33.5390001Z dist init r=1, world=2 2022-05-18T04:29:33.5394242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:33.5452796Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyisw0env 2022-05-18T04:29:33.5455536Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyisw0env/_remote_module_non_scriptable.py 2022-05-18T04:29:33.5668710Z dist init r=0, world=2 2022-05-18T04:29:33.5672860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:33.5673964Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:33.5700777Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:34.9105813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:34.9106714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:35.1071446Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:35.1077296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:35.1099207Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:35.1100091Z warnings.warn( 2022-05-18T04:29:35.1108063Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:35.1108632Z warnings.warn( 2022-05-18T04:29:35.1208775Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:35.1210526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:35.1689262Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:35.1689950Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:35.1694333Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:35.1694998Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:35.1770401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:35.1771769Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:35.2105978Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:35.2106982Z return iter(self.unbind(0)) 2022-05-18T04:29:35.2108391Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:29:35.2109184Z return iter(self.unbind(0)) 2022-05-18T04:29:35.5198091Z ok (2.931s) 2022-05-18T04:29:35.5331690Z test_nested_all_wrapped_model_offload_true_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42433 2022-05-18T04:29:35.5436126Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42434 2022-05-18T04:29:36.4442802Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg0u0f31i 2022-05-18T04:29:36.4444149Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg0u0f31i/_remote_module_non_scriptable.py 2022-05-18T04:29:36.4668244Z dist init r=1, world=2 2022-05-18T04:29:36.4672840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:36.4755407Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8xmi3ikv 2022-05-18T04:29:36.4758305Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8xmi3ikv/_remote_module_non_scriptable.py 2022-05-18T04:29:36.4970583Z dist init r=0, world=2 2022-05-18T04:29:36.4974597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:36.4975396Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:36.4979643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:37.8400780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:37.8401320Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:38.0438485Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:38.0439055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:38.0466207Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:38.0466777Z warnings.warn( 2022-05-18T04:29:38.0467561Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:38.0468100Z warnings.warn( 2022-05-18T04:29:38.0572933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:38.0573419Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:38.1175806Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:38.1176666Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:38.1177602Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:38.1178253Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:38.1251625Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:38.1252287Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:38.1520712Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:38.1522072Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:38.1523365Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:38.1524648Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:38.1526125Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:38.1527407Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:38.4512397Z ok (2.931s) 2022-05-18T04:29:38.4646211Z test_nested_all_wrapped_model_offload_true_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42516 2022-05-18T04:29:38.4753908Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42517 2022-05-18T04:29:39.3809060Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3y1ioxay 2022-05-18T04:29:39.3810700Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3y1ioxay/_remote_module_non_scriptable.py 2022-05-18T04:29:39.4034535Z dist init r=0, world=2 2022-05-18T04:29:39.4039022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:39.4199681Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_31ma5r 2022-05-18T04:29:39.4202431Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_31ma5r/_remote_module_non_scriptable.py 2022-05-18T04:29:39.4424342Z dist init r=1, world=2 2022-05-18T04:29:39.4428844Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:39.4430029Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:39.4447543Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:40.7954120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:40.7954634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:40.9963629Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:40.9971204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:40.9992197Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:40.9992777Z warnings.warn( 2022-05-18T04:29:40.9999989Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:41.0000531Z warnings.warn( 2022-05-18T04:29:41.0106695Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:41.0107198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:41.0705176Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:41.0705868Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:41.0707352Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:41.0708344Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:41.0782008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:41.0782512Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:41.1051883Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:41.1053231Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:41.1054490Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:41.1055765Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:41.1057211Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:41.1058477Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:41.4833058Z ok (3.032s) 2022-05-18T04:29:41.4967781Z test_nested_all_wrapped_model_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42599 2022-05-18T04:29:41.5073565Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42600 2022-05-18T04:29:42.4094860Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa8crog7e 2022-05-18T04:29:42.4095771Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa8crog7e/_remote_module_non_scriptable.py 2022-05-18T04:29:42.4167491Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpipihljur 2022-05-18T04:29:42.4170621Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpipihljur/_remote_module_non_scriptable.py 2022-05-18T04:29:42.4309706Z dist init r=0, world=2 2022-05-18T04:29:42.4313799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:42.4392894Z dist init r=1, world=2 2022-05-18T04:29:42.4397170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:42.4398024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:42.4417017Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:43.7783287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:43.7784612Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:43.9794188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:43.9802712Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:43.9822222Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:43.9822819Z warnings.warn( 2022-05-18T04:29:43.9833530Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:43.9834086Z warnings.warn( 2022-05-18T04:29:43.9941981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:43.9942841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:44.0532363Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:44.0533332Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:44.0534575Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:44.0535236Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:44.0611375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:44.0612208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:44.0876137Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:44.0877466Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:44.0878753Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:44.0880023Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:44.0881300Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:44.0882768Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:44.4150696Z ok (2.932s) 2022-05-18T04:29:44.4283549Z test_nested_all_wrapped_model_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42682 2022-05-18T04:29:44.4389640Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42683 2022-05-18T04:29:45.3373470Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0s0xv24s 2022-05-18T04:29:45.3374750Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0s0xv24s/_remote_module_non_scriptable.py 2022-05-18T04:29:45.3598300Z dist init r=1, world=2 2022-05-18T04:29:45.3602360Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:45.3729503Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv67bh43d 2022-05-18T04:29:45.3732163Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv67bh43d/_remote_module_non_scriptable.py 2022-05-18T04:29:45.3945949Z dist init r=0, world=2 2022-05-18T04:29:45.3950213Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:45.3951274Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:45.4011061Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:46.7434538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:46.7435079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:46.9452773Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:46.9453279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:46.9480821Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:46.9481407Z warnings.warn( 2022-05-18T04:29:46.9482167Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:46.9482687Z warnings.warn( 2022-05-18T04:29:46.9588941Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:46.9589896Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:47.0179175Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:47.0179892Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:47.0182396Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:47.0183074Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:47.0258160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:47.0259232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:47.0521786Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:47.0523086Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:47.0524367Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:47.0525644Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:47.0527068Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:47.0528337Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:29:47.3465483Z ok (2.931s) 2022-05-18T04:29:47.3596037Z test_nested_wrapped_model_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42765 2022-05-18T04:29:47.3700975Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42766 2022-05-18T04:29:48.2668563Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9bat7_mx 2022-05-18T04:29:48.2669730Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9bat7_mx/_remote_module_non_scriptable.py 2022-05-18T04:29:48.2894337Z dist init r=1, world=2 2022-05-18T04:29:48.2898685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:48.2955772Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4n1lzbp_ 2022-05-18T04:29:48.2958627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4n1lzbp_/_remote_module_non_scriptable.py 2022-05-18T04:29:48.3173241Z dist init r=0, world=2 2022-05-18T04:29:48.3177254Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:48.3178283Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:48.3205245Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:49.6614085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:49.6614661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:49.8616681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:49.8617538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:49.8648680Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:49.8649254Z warnings.warn( 2022-05-18T04:29:49.8650020Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:49.8650540Z warnings.warn( 2022-05-18T04:29:49.8996430Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:49.8997112Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:49.9004814Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:49.9005639Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:49.9098089Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:49.9098596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:50.2777315Z ok (2.931s) 2022-05-18T04:29:50.2909406Z test_nested_wrapped_model_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42848 2022-05-18T04:29:50.3020307Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42849 2022-05-18T04:29:51.2292961Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkx9z1rz_ 2022-05-18T04:29:51.2294293Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkx9z1rz_/_remote_module_non_scriptable.py 2022-05-18T04:29:51.2422997Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpquef56rj 2022-05-18T04:29:51.2425812Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpquef56rj/_remote_module_non_scriptable.py 2022-05-18T04:29:51.2509436Z dist init r=1, world=2 2022-05-18T04:29:51.2513615Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:51.2640186Z dist init r=0, world=2 2022-05-18T04:29:51.2644464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:51.2645702Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:51.2718944Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:52.5950138Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:52.5951111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:52.7995717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:52.8005131Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:52.8027527Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:52.8028679Z warnings.warn( 2022-05-18T04:29:52.8041782Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:52.8042918Z warnings.warn( 2022-05-18T04:29:52.8532217Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:52.8533620Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:52.8535583Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:52.8536948Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:52.8628882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:52.8629828Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:53.2096007Z ok (2.932s) 2022-05-18T04:29:53.2229042Z test_nested_wrapped_model_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42931 2022-05-18T04:29:53.2335465Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42932 2022-05-18T04:29:54.2086576Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps_y6p2ce 2022-05-18T04:29:54.2087637Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps_y6p2ce/_remote_module_non_scriptable.py 2022-05-18T04:29:54.2125483Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkhfce75m 2022-05-18T04:29:54.2128541Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkhfce75m/_remote_module_non_scriptable.py 2022-05-18T04:29:54.2309157Z dist init r=0, world=2 2022-05-18T04:29:54.2313278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:54.2341717Z dist init r=1, world=2 2022-05-18T04:29:54.2346182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:54.2347640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:54.2416751Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:55.5815767Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:55.5816312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:55.7848229Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:55.7848777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:55.7879929Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:55.7880525Z warnings.warn( 2022-05-18T04:29:55.7881283Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:55.7881822Z warnings.warn( 2022-05-18T04:29:55.8359378Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:55.8360101Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:55.8362377Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:55.8363065Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:55.8453230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:55.8453716Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:56.2413712Z ok (3.032s) 2022-05-18T04:29:56.2545236Z test_nested_wrapped_model_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43014 2022-05-18T04:29:56.2650502Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43015 2022-05-18T04:29:57.1762662Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm60hrhms 2022-05-18T04:29:57.1763954Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm60hrhms/_remote_module_non_scriptable.py 2022-05-18T04:29:57.1980295Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcdm425dk 2022-05-18T04:29:57.1983125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcdm425dk/_remote_module_non_scriptable.py 2022-05-18T04:29:57.1986426Z dist init r=1, world=2 2022-05-18T04:29:57.1990737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:29:57.2198440Z dist init r=0, world=2 2022-05-18T04:29:57.2203020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:29:57.2203835Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:57.2298033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:29:58.5652438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:58.5652978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:58.7648937Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:58.7657731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:58.7681892Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:58.7682474Z warnings.warn( 2022-05-18T04:29:58.7690325Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:29:58.7690878Z warnings.warn( 2022-05-18T04:29:58.8041793Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:58.8042503Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:58.8050815Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:29:58.8051521Z warnings.warn(msg, FutureWarning) 2022-05-18T04:29:58.8140265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:58.8141864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:29:59.1728296Z ok (2.931s) 2022-05-18T04:29:59.1860137Z test_nested_wrapped_model_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43097 2022-05-18T04:29:59.1965257Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43098 2022-05-18T04:30:00.0819740Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppxt1_ux2 2022-05-18T04:30:00.0821096Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppxt1_ux2/_remote_module_non_scriptable.py 2022-05-18T04:30:00.1043950Z dist init r=1, world=2 2022-05-18T04:30:00.1048157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:00.1292831Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppmge43h8 2022-05-18T04:30:00.1295347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppmge43h8/_remote_module_non_scriptable.py 2022-05-18T04:30:00.1508179Z dist init r=0, world=2 2022-05-18T04:30:00.1512499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:00.1513308Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:00.1558482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:01.5048397Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:01.7083198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:01.7083741Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:01.7084243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:01.7115112Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:01.7115874Z warnings.warn( 2022-05-18T04:30:01.7116652Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:01.7117194Z warnings.warn( 2022-05-18T04:30:01.7596169Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:01.7596886Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:01.7597820Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:01.7598474Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:01.7686657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:01.7688838Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:02.1043968Z ok (2.931s) 2022-05-18T04:30:02.1178344Z test_nested_wrapped_model_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43180 2022-05-18T04:30:02.1292244Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43181 2022-05-18T04:30:03.0271530Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt3cic2uw 2022-05-18T04:30:03.0272810Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt3cic2uw/_remote_module_non_scriptable.py 2022-05-18T04:30:03.0495996Z dist init r=1, world=2 2022-05-18T04:30:03.0500373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:03.0765525Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbm640tn8 2022-05-18T04:30:03.0768168Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbm640tn8/_remote_module_non_scriptable.py 2022-05-18T04:30:03.0981638Z dist init r=0, world=2 2022-05-18T04:30:03.0985845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:03.0987038Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:03.1010564Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:04.4492351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:04.4492883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:04.6482986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:04.6492483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:04.6514690Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:04.6515266Z warnings.warn( 2022-05-18T04:30:04.6526075Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:04.6526628Z warnings.warn( 2022-05-18T04:30:04.7011642Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:04.7012336Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:04.7015578Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:04.7016245Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:04.7107350Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:04.7108461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:05.0369029Z ok (2.932s) 2022-05-18T04:30:05.0499278Z test_nested_wrapped_model_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43263 2022-05-18T04:30:05.0603674Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43264 2022-05-18T04:30:05.9569825Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpipimkyr_ 2022-05-18T04:30:05.9571011Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpipimkyr_/_remote_module_non_scriptable.py 2022-05-18T04:30:05.9796469Z dist init r=0, world=2 2022-05-18T04:30:05.9800804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:05.9924625Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv0qfsb2b 2022-05-18T04:30:05.9927628Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv0qfsb2b/_remote_module_non_scriptable.py 2022-05-18T04:30:06.0146308Z dist init r=1, world=2 2022-05-18T04:30:06.0150551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:06.0151540Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:06.0209440Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:07.3580442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:07.5587356Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:07.5588141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:07.5589002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:07.5619244Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:07.5619989Z warnings.warn( 2022-05-18T04:30:07.5620776Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:07.5621323Z warnings.warn( 2022-05-18T04:30:07.5971587Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:07.5972557Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:07.5978464Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:07.5979413Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:07.6069544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:07.6071289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:07.9681378Z ok (2.931s) 2022-05-18T04:30:07.9814864Z test_nested_wrapped_model_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43346 2022-05-18T04:30:07.9920536Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43347 2022-05-18T04:30:08.8803469Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6fhesmvv 2022-05-18T04:30:08.8804554Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6fhesmvv/_remote_module_non_scriptable.py 2022-05-18T04:30:08.8913686Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgw18nyuh 2022-05-18T04:30:08.8916385Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgw18nyuh/_remote_module_non_scriptable.py 2022-05-18T04:30:08.9028249Z dist init r=1, world=2 2022-05-18T04:30:08.9032622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:08.9130577Z dist init r=0, world=2 2022-05-18T04:30:08.9134736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:08.9135815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:08.9136527Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:10.2622254Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:10.2622954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:10.4643653Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:10.4653162Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:10.4675340Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:10.4676238Z warnings.warn( 2022-05-18T04:30:10.4686647Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:10.4687230Z warnings.warn( 2022-05-18T04:30:10.5186354Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:10.5187038Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:10.5189776Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:10.5190472Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:10.5281012Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:10.5282161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:10.8998950Z ok (2.932s) 2022-05-18T04:30:10.9131653Z test_nested_wrapped_model_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43429 2022-05-18T04:30:10.9236409Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43430 2022-05-18T04:30:11.8232839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn103eg2t 2022-05-18T04:30:11.8233701Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn103eg2t/_remote_module_non_scriptable.py 2022-05-18T04:30:11.8323472Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjrf664t2 2022-05-18T04:30:11.8326370Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjrf664t2/_remote_module_non_scriptable.py 2022-05-18T04:30:11.8450051Z dist init r=0, world=2 2022-05-18T04:30:11.8454124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:11.8549232Z dist init r=1, world=2 2022-05-18T04:30:11.8553744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:11.8555062Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:11.8557408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:13.2131669Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:13.2132252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:13.4109691Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:13.4117584Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:13.4141018Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:13.4141597Z warnings.warn( 2022-05-18T04:30:13.4150475Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:13.4151026Z warnings.warn( 2022-05-18T04:30:13.4625710Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:13.4626739Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:13.4628263Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:13.4628983Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:13.4715929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:13.4716435Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:13.8314027Z ok (2.931s) 2022-05-18T04:30:13.8446964Z test_nested_wrapped_model_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43512 2022-05-18T04:30:13.8555683Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43513 2022-05-18T04:30:14.7465972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppg7j6n0w 2022-05-18T04:30:14.7467253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppg7j6n0w/_remote_module_non_scriptable.py 2022-05-18T04:30:14.7480367Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0zysp1y7 2022-05-18T04:30:14.7483020Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0zysp1y7/_remote_module_non_scriptable.py 2022-05-18T04:30:14.7690607Z dist init r=0, world=2 2022-05-18T04:30:14.7695347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:14.7698357Z dist init r=1, world=2 2022-05-18T04:30:14.7702709Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:14.7704038Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:14.7798993Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:16.1281732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:16.1282657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:16.3292115Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:16.3292656Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:16.3324654Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:16.3325241Z warnings.warn( 2022-05-18T04:30:16.3326002Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:16.3326545Z warnings.warn( 2022-05-18T04:30:16.3434731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:16.3435231Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:16.3464159Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3466122Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3467388Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3468663Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3469931Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3471195Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3472456Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3473700Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.3942570Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:16.3943290Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:16.3946802Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:16.3947491Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:16.4036503Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:16.4037004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:16.4159960Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.4161683Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:16.7632354Z ok (2.932s) 2022-05-18T04:30:16.7765044Z test_nested_wrapped_model_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43595 2022-05-18T04:30:16.7869148Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43596 2022-05-18T04:30:17.6805706Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsjyvocf8 2022-05-18T04:30:17.6807245Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsjyvocf8/_remote_module_non_scriptable.py 2022-05-18T04:30:17.7028531Z dist init r=1, world=2 2022-05-18T04:30:17.7032951Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:17.7133078Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj7dghrk7 2022-05-18T04:30:17.7135621Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj7dghrk7/_remote_module_non_scriptable.py 2022-05-18T04:30:17.7348913Z dist init r=0, world=2 2022-05-18T04:30:17.7353065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:17.7354311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:17.7441843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:19.0824642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:19.0825178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:19.2866589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:19.2867180Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:19.2898781Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:19.2899374Z warnings.warn( 2022-05-18T04:30:19.2900455Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:19.2901020Z warnings.warn( 2022-05-18T04:30:19.3013296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:19.3013820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:19.3042077Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3043512Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3044789Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3046293Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3047551Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3048810Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3050072Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3051333Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:19.3631560Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:19.3632255Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:19.3633398Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:19.3634047Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:19.3723783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:19.3724672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:19.3929336Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:19.3930132Z return iter(self.unbind(0)) 2022-05-18T04:30:19.3931248Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:19.3932025Z return iter(self.unbind(0)) 2022-05-18T04:30:19.7948238Z ok (3.031s) 2022-05-18T04:30:19.8078300Z test_nested_wrapped_model_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43678 2022-05-18T04:30:19.8183199Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43679 2022-05-18T04:30:20.7134454Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq3fx1t8o 2022-05-18T04:30:20.7135574Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq3fx1t8o/_remote_module_non_scriptable.py 2022-05-18T04:30:20.7165539Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp64c190a8 2022-05-18T04:30:20.7168755Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp64c190a8/_remote_module_non_scriptable.py 2022-05-18T04:30:20.7349353Z dist init r=0, world=2 2022-05-18T04:30:20.7353593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:20.7390367Z dist init r=1, world=2 2022-05-18T04:30:20.7394726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:20.7395959Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:20.7456996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:22.0812975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:22.0813538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:22.2812630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:22.2821990Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:22.2844057Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:22.2844636Z warnings.warn( 2022-05-18T04:30:22.2855656Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:22.2856187Z warnings.warn( 2022-05-18T04:30:22.2970445Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:22.2971468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:22.2998762Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3000096Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3001360Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3002633Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3003908Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3005298Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3006563Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3007855Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:22.3573816Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:22.3574505Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:22.3577018Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:22.3577672Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:22.3667092Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:22.3668795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:22.3867418Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:22.3868238Z return iter(self.unbind(0)) 2022-05-18T04:30:22.3869359Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:22.3870129Z return iter(self.unbind(0)) 2022-05-18T04:30:22.7259800Z ok (2.931s) 2022-05-18T04:30:22.7391735Z test_nested_wrapped_model_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43761 2022-05-18T04:30:22.7497184Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43762 2022-05-18T04:30:23.6499374Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpywhgltso 2022-05-18T04:30:23.6500996Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpywhgltso/_remote_module_non_scriptable.py 2022-05-18T04:30:23.6727146Z dist init r=0, world=2 2022-05-18T04:30:23.6731731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:23.6758539Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptswc1p5z 2022-05-18T04:30:23.6761248Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptswc1p5z/_remote_module_non_scriptable.py 2022-05-18T04:30:23.6974952Z dist init r=1, world=2 2022-05-18T04:30:23.6979113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:23.6980298Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:23.7038881Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:25.0415304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:25.0415836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:25.2425194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:25.2425758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:25.2457852Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:25.2458432Z warnings.warn( 2022-05-18T04:30:25.2459192Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:25.2459738Z warnings.warn( 2022-05-18T04:30:25.2568125Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:25.2568628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:25.2597467Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2599069Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2600341Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2601630Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2602893Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2604153Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2605542Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.2606814Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.3084214Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:25.3084916Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:25.3085822Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:25.3086473Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:25.3175939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:25.3176441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:25.3301088Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.3302387Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:25.6573685Z ok (2.931s) 2022-05-18T04:30:25.6705067Z test_nested_wrapped_model_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43844 2022-05-18T04:30:25.6812237Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43845 2022-05-18T04:30:26.5750360Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn9czwr2c 2022-05-18T04:30:26.5751806Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn9czwr2c/_remote_module_non_scriptable.py 2022-05-18T04:30:26.5973902Z dist init r=0, world=2 2022-05-18T04:30:26.5978358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:26.6227228Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3laqwo4_ 2022-05-18T04:30:26.6230403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3laqwo4_/_remote_module_non_scriptable.py 2022-05-18T04:30:26.6445660Z dist init r=1, world=2 2022-05-18T04:30:26.6450217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:26.6451301Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:26.6489321Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:28.0010263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:28.0010800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:28.2030158Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:28.2038540Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:28.2061273Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:28.2061849Z warnings.warn( 2022-05-18T04:30:28.2071706Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:28.2072259Z warnings.warn( 2022-05-18T04:30:28.2182235Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:28.2182717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:28.2210772Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2212065Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2213336Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2214892Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2216185Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2217444Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2218678Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2219936Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:28.2782309Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:28.2782989Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:28.2784181Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:28.2784845Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:28.2870318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:28.2870828Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:28.3072658Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:28.3073440Z return iter(self.unbind(0)) 2022-05-18T04:30:28.3074580Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:28.3075338Z return iter(self.unbind(0)) 2022-05-18T04:30:28.6890422Z ok (3.031s) 2022-05-18T04:30:28.7023978Z test_nested_wrapped_model_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43927 2022-05-18T04:30:28.7128501Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43928 2022-05-18T04:30:29.6004628Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8yd0luf3 2022-05-18T04:30:29.6005973Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8yd0luf3/_remote_module_non_scriptable.py 2022-05-18T04:30:29.6229191Z dist init r=1, world=2 2022-05-18T04:30:29.6233660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:29.6300163Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy7vax046 2022-05-18T04:30:29.6302916Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy7vax046/_remote_module_non_scriptable.py 2022-05-18T04:30:29.6515563Z dist init r=0, world=2 2022-05-18T04:30:29.6519916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:29.6521052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:29.6540682Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:31.0119976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:31.0120508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:31.2103961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:31.2104750Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:31.2135371Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:31.2135952Z warnings.warn( 2022-05-18T04:30:31.2136694Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:31.2137237Z warnings.warn( 2022-05-18T04:30:31.2247820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:31.2248329Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:31.2275703Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2277016Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2278290Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2279553Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2280818Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2282225Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2283506Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2284764Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:31.2840009Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:31.2840821Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:31.2841759Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:31.2842408Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:31.2929689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:31.2930177Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:31.3128016Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:31.3128823Z return iter(self.unbind(0)) 2022-05-18T04:30:31.3129950Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:31.3130718Z return iter(self.unbind(0)) 2022-05-18T04:30:31.6206119Z ok (2.931s) 2022-05-18T04:30:31.6335753Z test_nested_wrapped_model_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44010 2022-05-18T04:30:31.6439815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44011 2022-05-18T04:30:32.5968233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpidih4cpk 2022-05-18T04:30:32.5969135Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpidih4cpk/_remote_module_non_scriptable.py 2022-05-18T04:30:32.5989528Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzpt0a119 2022-05-18T04:30:32.5992631Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzpt0a119/_remote_module_non_scriptable.py 2022-05-18T04:30:32.6184639Z dist init r=1, world=2 2022-05-18T04:30:32.6189473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:32.6210265Z dist init r=0, world=2 2022-05-18T04:30:32.6214491Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:32.6215292Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:32.6292981Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:33.9543983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:33.9544783Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:34.1547605Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:34.1555244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:34.1579425Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:34.1580007Z warnings.warn( 2022-05-18T04:30:34.1587442Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:34.1588305Z warnings.warn( 2022-05-18T04:30:34.1692892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:34.1693414Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:34.1722216Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1723537Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1724820Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1726089Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1727357Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1728627Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1730004Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.1731288Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.2201155Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:34.2201849Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:34.2204923Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:34.2205600Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:34.2291332Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:34.2291847Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:34.2411713Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.2413007Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:34.5525411Z ok (2.932s) 2022-05-18T04:30:34.5658406Z test_nested_wrapped_model_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44093 2022-05-18T04:30:34.5763469Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44094 2022-05-18T04:30:35.4760328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv8awmige 2022-05-18T04:30:35.4761553Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv8awmige/_remote_module_non_scriptable.py 2022-05-18T04:30:35.4927709Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq7rickm5 2022-05-18T04:30:35.4930493Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq7rickm5/_remote_module_non_scriptable.py 2022-05-18T04:30:35.4975610Z dist init r=0, world=2 2022-05-18T04:30:35.4979807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:35.5156427Z dist init r=1, world=2 2022-05-18T04:30:35.5161140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:35.5162129Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:35.5184984Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:36.8492514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:36.8493037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:37.0464466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:37.0465023Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:37.0496070Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:37.0496675Z warnings.warn( 2022-05-18T04:30:37.0497438Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:37.0497973Z warnings.warn( 2022-05-18T04:30:37.0611276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:37.0611974Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:37.0639806Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0641993Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0643267Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0644529Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0645790Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0647045Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0648286Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.0649541Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:37.1231668Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:37.1232383Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:37.1234448Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:37.1235126Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:37.1324554Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:37.1325064Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:37.1533179Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:37.1533979Z return iter(self.unbind(0)) 2022-05-18T04:30:37.1535118Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:37.1536021Z return iter(self.unbind(0)) 2022-05-18T04:30:37.4839174Z ok (2.931s) 2022-05-18T04:30:37.4971246Z test_nested_wrapped_model_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44176 2022-05-18T04:30:37.5078931Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44177 2022-05-18T04:30:38.4507666Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwf3ci85x 2022-05-18T04:30:38.4508727Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwf3ci85x/_remote_module_non_scriptable.py 2022-05-18T04:30:38.4661222Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmqdzx8sh 2022-05-18T04:30:38.4664120Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmqdzx8sh/_remote_module_non_scriptable.py 2022-05-18T04:30:38.4723679Z dist init r=0, world=2 2022-05-18T04:30:38.4727388Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:38.4885298Z dist init r=1, world=2 2022-05-18T04:30:38.4889804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:38.4890864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:38.4933678Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:39.8403057Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:39.8403590Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:40.0424764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:40.0433616Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:40.0456344Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:40.0457031Z warnings.warn( 2022-05-18T04:30:40.0467036Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:40.0467592Z warnings.warn( 2022-05-18T04:30:40.0577947Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:40.0578432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:40.0606375Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0607746Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0609200Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0610469Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0611727Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0612984Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0614237Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.0615471Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:30:40.1176374Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:40.1177067Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:40.1178141Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:40.1178809Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:40.1263833Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:40.1264344Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:30:40.1461508Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:40.1462345Z return iter(self.unbind(0)) 2022-05-18T04:30:40.1463509Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:30:40.1464781Z return iter(self.unbind(0)) 2022-05-18T04:30:40.5158013Z ok (3.032s) 2022-05-18T04:30:40.5294636Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_None_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44259 2022-05-18T04:30:40.5399798Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44260 2022-05-18T04:30:41.4423912Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt3hdb2wu 2022-05-18T04:30:41.4425602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt3hdb2wu/_remote_module_non_scriptable.py 2022-05-18T04:30:41.4649821Z dist init r=1, world=2 2022-05-18T04:30:41.4654545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:41.4806398Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq2541l7f 2022-05-18T04:30:41.4809057Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq2541l7f/_remote_module_non_scriptable.py 2022-05-18T04:30:41.5020304Z dist init r=0, world=2 2022-05-18T04:30:41.5024751Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:41.5026005Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:41.5062953Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:42.8666530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:42.8667061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:43.0734602Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:43.0735227Z warnings.warn( 2022-05-18T04:30:43.0745302Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:43.0745874Z warnings.warn( 2022-05-18T04:30:43.1059094Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:43.1059944Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:43.1061232Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:43.1061912Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:43.4477087Z ok (2.932s) 2022-05-18T04:30:43.4610580Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_None_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44342 2022-05-18T04:30:43.4715559Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44343 2022-05-18T04:30:44.4056878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyi21fc4g 2022-05-18T04:30:44.4058118Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyi21fc4g/_remote_module_non_scriptable.py 2022-05-18T04:30:44.4193247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmcrhrkbt 2022-05-18T04:30:44.4196435Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmcrhrkbt/_remote_module_non_scriptable.py 2022-05-18T04:30:44.4277541Z dist init r=1, world=2 2022-05-18T04:30:44.4281932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:44.4418267Z dist init r=0, world=2 2022-05-18T04:30:44.4422914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:44.4424350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:44.4487324Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:45.7841782Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:45.7842289Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:45.9833813Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:45.9834381Z warnings.warn( 2022-05-18T04:30:45.9884590Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:45.9885139Z warnings.warn( 2022-05-18T04:30:46.0230967Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:46.0231661Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:46.0233892Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:46.0234571Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:46.3792069Z ok (2.931s) 2022-05-18T04:30:46.3925161Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44425 2022-05-18T04:30:46.4029855Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44426 2022-05-18T04:30:47.3001269Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpekguhia4 2022-05-18T04:30:47.3002625Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpekguhia4/_remote_module_non_scriptable.py 2022-05-18T04:30:47.3226478Z dist init r=1, world=2 2022-05-18T04:30:47.3230941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:47.3384764Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp43krx9sp 2022-05-18T04:30:47.3387678Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp43krx9sp/_remote_module_non_scriptable.py 2022-05-18T04:30:47.3599458Z dist init r=0, world=2 2022-05-18T04:30:47.3603730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:47.3604669Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:47.3639311Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:48.6974464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:48.6975009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:48.8965578Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:48.8966190Z warnings.warn( 2022-05-18T04:30:48.8971369Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:48.8971911Z warnings.warn( 2022-05-18T04:30:48.9205129Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:48.9205820Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:48.9211886Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:48.9212565Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:49.2105565Z ok (2.831s) 2022-05-18T04:30:49.2241823Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44508 2022-05-18T04:30:49.2349449Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44509 2022-05-18T04:30:50.1952282Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbah55pm9 2022-05-18T04:30:50.1953410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbah55pm9/_remote_module_non_scriptable.py 2022-05-18T04:30:50.2083980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmzbomxkr 2022-05-18T04:30:50.2086819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmzbomxkr/_remote_module_non_scriptable.py 2022-05-18T04:30:50.2169482Z dist init r=0, world=2 2022-05-18T04:30:50.2173570Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:50.2309954Z dist init r=1, world=2 2022-05-18T04:30:50.2314308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:50.2315356Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:50.2378846Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:51.5752457Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:51.5753018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:51.7808144Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:51.7808756Z warnings.warn( 2022-05-18T04:30:51.7833996Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:51.7834854Z warnings.warn( 2022-05-18T04:30:51.8090265Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:51.8090947Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:51.8095703Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:51.8096374Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:52.1426415Z ok (2.932s) 2022-05-18T04:30:52.1562537Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44591 2022-05-18T04:30:52.1667909Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44592 2022-05-18T04:30:53.0845590Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwy4vcuoc 2022-05-18T04:30:53.0846484Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwy4vcuoc/_remote_module_non_scriptable.py 2022-05-18T04:30:53.1008033Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsqf9ydcx 2022-05-18T04:30:53.1010754Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsqf9ydcx/_remote_module_non_scriptable.py 2022-05-18T04:30:53.1066243Z dist init r=1, world=2 2022-05-18T04:30:53.1070271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:53.1231380Z dist init r=0, world=2 2022-05-18T04:30:53.1235440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:53.1236659Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:53.1275518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:54.4930642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:54.4931494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:54.6956765Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:54.6957416Z warnings.warn( 2022-05-18T04:30:54.6972169Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:54.6972724Z warnings.warn( 2022-05-18T04:30:54.7286961Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:54.7287642Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:54.7288571Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:54.7289503Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:55.0748555Z ok (2.932s) 2022-05-18T04:30:55.0884622Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44674 2022-05-18T04:30:55.0990834Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44675 2022-05-18T04:30:56.0583168Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyjx8g_u8 2022-05-18T04:30:56.0584569Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyjx8g_u8/_remote_module_non_scriptable.py 2022-05-18T04:30:56.0639746Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp27wiyalh 2022-05-18T04:30:56.0642397Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp27wiyalh/_remote_module_non_scriptable.py 2022-05-18T04:30:56.0806114Z dist init r=1, world=2 2022-05-18T04:30:56.0810345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:56.0866408Z dist init r=0, world=2 2022-05-18T04:30:56.0870621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:56.0871579Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:56.0914279Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:57.4590365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:57.4590921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:57.6611403Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:57.6612007Z warnings.warn( 2022-05-18T04:30:57.6629037Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:30:57.6629594Z warnings.warn( 2022-05-18T04:30:57.6973250Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:57.6973948Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:57.6974878Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:30:57.6975531Z warnings.warn(msg, FutureWarning) 2022-05-18T04:30:58.0068975Z ok (2.932s) 2022-05-18T04:30:58.0205104Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_None_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44757 2022-05-18T04:30:58.0309871Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44758 2022-05-18T04:30:58.9243943Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp92b0oj8z 2022-05-18T04:30:58.9244797Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp92b0oj8z/_remote_module_non_scriptable.py 2022-05-18T04:30:58.9265451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt3p7aq5m 2022-05-18T04:30:58.9268643Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt3p7aq5m/_remote_module_non_scriptable.py 2022-05-18T04:30:58.9457305Z dist init r=1, world=2 2022-05-18T04:30:58.9461338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:30:58.9491602Z dist init r=0, world=2 2022-05-18T04:30:58.9495844Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:30:58.9496942Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:30:58.9564879Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:00.3029723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:00.3030290Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:00.5060848Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:00.5061459Z warnings.warn( 2022-05-18T04:31:00.5062224Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:00.5062763Z warnings.warn( 2022-05-18T04:31:00.5174918Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5176233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5177783Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5179079Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5180350Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5181594Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5182850Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5184417Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:00.5527112Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:00.5527797Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:00.5529255Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:00.5529921Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:00.9388910Z ok (2.932s) 2022-05-18T04:31:00.9525702Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_None_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44840 2022-05-18T04:31:00.9635817Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44841 2022-05-18T04:31:01.8677025Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7wc4zhzs 2022-05-18T04:31:01.8677843Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7wc4zhzs/_remote_module_non_scriptable.py 2022-05-18T04:31:01.8698299Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd_5_1z75 2022-05-18T04:31:01.8701009Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd_5_1z75/_remote_module_non_scriptable.py 2022-05-18T04:31:01.8895323Z dist init r=1, world=2 2022-05-18T04:31:01.8899483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:01.8924181Z dist init r=0, world=2 2022-05-18T04:31:01.8928628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:01.8930135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:01.9003173Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:03.2409295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:03.2409827Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:03.4442819Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:03.4443422Z warnings.warn( 2022-05-18T04:31:03.4449578Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:03.4450166Z warnings.warn( 2022-05-18T04:31:03.4565522Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4567139Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4568434Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4569699Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4570954Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4572225Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4573488Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4574748Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4971065Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:03.4972001Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:03.4973680Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:03.4974364Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:03.4978622Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.4982741Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:03.8713172Z ok (2.932s) 2022-05-18T04:31:03.8848413Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44923 2022-05-18T04:31:03.8954143Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44924 2022-05-18T04:31:04.8119370Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7kt25a2u 2022-05-18T04:31:04.8120225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7kt25a2u/_remote_module_non_scriptable.py 2022-05-18T04:31:04.8320500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyr9hvbcn 2022-05-18T04:31:04.8323361Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyr9hvbcn/_remote_module_non_scriptable.py 2022-05-18T04:31:04.8343530Z dist init r=1, world=2 2022-05-18T04:31:04.8348197Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:04.8537772Z dist init r=0, world=2 2022-05-18T04:31:04.8542138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:04.8543019Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:04.8553390Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:06.2082507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:06.2083484Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:06.4111893Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:06.4113080Z warnings.warn( 2022-05-18T04:31:06.4128778Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:06.4129859Z warnings.warn( 2022-05-18T04:31:06.4248035Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4250600Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4253259Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4255934Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4258530Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4261402Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4264344Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4266852Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:06.4557543Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:06.4558979Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:06.4560910Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:06.4562149Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:06.8033268Z ok (2.932s) 2022-05-18T04:31:06.8168850Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45006 2022-05-18T04:31:06.8276252Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45007 2022-05-18T04:31:07.7291986Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnf45yvz9 2022-05-18T04:31:07.7293339Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnf45yvz9/_remote_module_non_scriptable.py 2022-05-18T04:31:07.7306534Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz146vh2q 2022-05-18T04:31:07.7309278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz146vh2q/_remote_module_non_scriptable.py 2022-05-18T04:31:07.7517062Z dist init r=1, world=2 2022-05-18T04:31:07.7521703Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:07.7525270Z dist init r=0, world=2 2022-05-18T04:31:07.7529476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:07.7530407Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:07.7625664Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:09.0990929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:09.0991518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:09.2999344Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:09.3000301Z warnings.warn( 2022-05-18T04:31:09.3025619Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:09.3026157Z warnings.warn( 2022-05-18T04:31:09.3139436Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:31:09.3140232Z param.add_(d_p, alpha=alpha) 2022-05-18T04:31:09.3141364Z /opt/conda/lib/python3.9/site-packages/torch/optim/sgd.py:241: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:31:09.3142154Z param.add_(d_p, alpha=alpha) 2022-05-18T04:31:09.3479437Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:09.3480135Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:09.3482437Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:09.3483118Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:09.3714822Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:31:09.3715845Z return iter(self.unbind(0)) 2022-05-18T04:31:09.3716978Z /opt/conda/lib/python3.9/site-packages/torch/_tensor.py:732: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:31:09.3717756Z return iter(self.unbind(0)) 2022-05-18T04:31:09.7353289Z ok (2.932s) 2022-05-18T04:31:09.7487937Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45089 2022-05-18T04:31:09.7592648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45090 2022-05-18T04:31:10.6556278Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbmr93ro1 2022-05-18T04:31:10.6557133Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbmr93ro1/_remote_module_non_scriptable.py 2022-05-18T04:31:10.6574036Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpps7najhp 2022-05-18T04:31:10.6577235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpps7najhp/_remote_module_non_scriptable.py 2022-05-18T04:31:10.6770343Z dist init r=0, world=2 2022-05-18T04:31:10.6774549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:10.6802543Z dist init r=1, world=2 2022-05-18T04:31:10.6806903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:10.6808189Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:10.6878217Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:12.0385447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:12.0385997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:12.2427181Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:12.2427796Z warnings.warn( 2022-05-18T04:31:12.2464144Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:12.2464761Z warnings.warn( 2022-05-18T04:31:12.2582724Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2584233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2585813Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2587112Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2588373Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2589625Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2590880Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2592255Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:12.2940120Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:12.2940810Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:12.2943216Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:12.2944031Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:12.6671633Z ok (2.932s) 2022-05-18T04:31:12.6805471Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45172 2022-05-18T04:31:12.6913803Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45173 2022-05-18T04:31:13.5887759Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5zj7m5nt 2022-05-18T04:31:13.5888419Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5zj7m5nt/_remote_module_non_scriptable.py 2022-05-18T04:31:13.5930115Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpllfc98f6 2022-05-18T04:31:13.5932867Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpllfc98f6/_remote_module_non_scriptable.py 2022-05-18T04:31:13.6104798Z dist init r=0, world=2 2022-05-18T04:31:13.6109428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:13.6157567Z dist init r=1, world=2 2022-05-18T04:31:13.6161966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:13.6163240Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:13.6213337Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:14.9721173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:14.9721713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:15.1751323Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:15.1751929Z warnings.warn( 2022-05-18T04:31:15.1752709Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:15.1753255Z warnings.warn( 2022-05-18T04:31:15.1859257Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1860915Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1862184Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1863457Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1864963Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1866247Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1867509Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.1868767Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.2245602Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:15.2246307Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:15.2247658Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:15.2248378Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:15.2253735Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.2255661Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:31:15.5990706Z ok (2.932s) 2022-05-18T04:31:15.6120449Z test_transformer_parameterized_offload_false_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45255 2022-05-18T04:31:15.6225100Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45256 2022-05-18T04:31:16.6033839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqnk1o3vn 2022-05-18T04:31:16.6034924Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqnk1o3vn/_remote_module_non_scriptable.py 2022-05-18T04:31:16.6052042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_kjc7w93 2022-05-18T04:31:16.6054875Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_kjc7w93/_remote_module_non_scriptable.py 2022-05-18T04:31:16.6247585Z dist init r=0, world=2 2022-05-18T04:31:16.6251584Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:16.6273146Z dist init r=1, world=2 2022-05-18T04:31:16.6277328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:16.6278509Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:16.6355098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:17.9557476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:17.9558025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:18.7373673Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:18.7394181Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:18.7637300Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:18.7637889Z warnings.warn( 2022-05-18T04:31:18.7662708Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:18.7663241Z warnings.warn( 2022-05-18T04:31:18.8346894Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:18.8347606Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:18.8354333Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:18.8355008Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:18.8912837Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:18.8916745Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:19.3320506Z ok (3.733s) 2022-05-18T04:31:19.3453873Z test_transformer_parameterized_offload_false_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45338 2022-05-18T04:31:19.3561311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45339 2022-05-18T04:31:20.2464052Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphoqtzywd 2022-05-18T04:31:20.2465270Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphoqtzywd/_remote_module_non_scriptable.py 2022-05-18T04:31:20.2688713Z dist init r=1, world=2 2022-05-18T04:31:20.2692759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:20.2826824Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxywk0goq 2022-05-18T04:31:20.2829551Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxywk0goq/_remote_module_non_scriptable.py 2022-05-18T04:31:20.3042960Z dist init r=0, world=2 2022-05-18T04:31:20.3047226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:20.3048310Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:20.3101504Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:21.6445092Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:21.6445648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:22.4264026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:22.4285843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:22.4523272Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:22.4562002Z warnings.warn( 2022-05-18T04:31:22.4562902Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:22.4563455Z warnings.warn( 2022-05-18T04:31:22.5252772Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:22.5253468Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:22.5268899Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:22.5269658Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:22.5833815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:22.5849911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:23.0654006Z ok (3.733s) 2022-05-18T04:31:23.0788096Z test_transformer_parameterized_offload_false_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45421 2022-05-18T04:31:23.0893495Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45422 2022-05-18T04:31:23.9866798Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0dmgfnij 2022-05-18T04:31:23.9867918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0dmgfnij/_remote_module_non_scriptable.py 2022-05-18T04:31:24.0082109Z dist init r=1, world=2 2022-05-18T04:31:24.0086278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:24.0335213Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj_g8siyd 2022-05-18T04:31:24.0338075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj_g8siyd/_remote_module_non_scriptable.py 2022-05-18T04:31:24.0560866Z dist init r=0, world=2 2022-05-18T04:31:24.0565640Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:24.0566456Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:24.0596096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:25.4013335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:25.4013884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:26.1812296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:26.1812848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:26.2079176Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:26.2079767Z warnings.warn( 2022-05-18T04:31:26.2083979Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:26.2084541Z warnings.warn( 2022-05-18T04:31:26.2815615Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:26.2816327Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:26.2829725Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:26.2830388Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:26.3400153Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:26.3400689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:26.7987983Z ok (3.733s) 2022-05-18T04:31:26.8119616Z test_transformer_parameterized_offload_false_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45504 2022-05-18T04:31:26.8227422Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45505 2022-05-18T04:31:27.7186760Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp53s1qoqc 2022-05-18T04:31:27.7187848Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp53s1qoqc/_remote_module_non_scriptable.py 2022-05-18T04:31:27.7190025Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprdqq2ijr 2022-05-18T04:31:27.7193053Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprdqq2ijr/_remote_module_non_scriptable.py 2022-05-18T04:31:27.7402775Z dist init r=0, world=2 2022-05-18T04:31:27.7406981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:27.7415574Z dist init r=1, world=2 2022-05-18T04:31:27.7419737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:27.7420934Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:27.7510522Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:29.0748285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:29.0748817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:29.8512301Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:29.8512885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:29.8776716Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:29.8777339Z warnings.warn( 2022-05-18T04:31:29.8778109Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:29.8778663Z warnings.warn( 2022-05-18T04:31:29.9485966Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:29.9486675Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:29.9494516Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:29.9495198Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:30.0043892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:30.0044789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:30.4319459Z ok (3.633s) 2022-05-18T04:31:30.4452954Z test_transformer_parameterized_offload_false_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45587 2022-05-18T04:31:30.4558341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45588 2022-05-18T04:31:31.3669340Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9envo6p3 2022-05-18T04:31:31.3670585Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9envo6p3/_remote_module_non_scriptable.py 2022-05-18T04:31:31.3886363Z dist init r=1, world=2 2022-05-18T04:31:31.3890442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:31.3960667Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmple7btihr 2022-05-18T04:31:31.3963345Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmple7btihr/_remote_module_non_scriptable.py 2022-05-18T04:31:31.4178922Z dist init r=0, world=2 2022-05-18T04:31:31.4183223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:31.4184937Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:31.4197112Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:32.7418118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:32.7418952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:33.5103678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:33.5122391Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:33.5372301Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:33.5372871Z warnings.warn( 2022-05-18T04:31:33.5384747Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:33.5385311Z warnings.warn( 2022-05-18T04:31:33.6100236Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:33.6100924Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:33.6109072Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:33.6109757Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:33.6666065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:33.6667111Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:34.1651606Z ok (3.733s) 2022-05-18T04:31:34.1783157Z test_transformer_parameterized_offload_false_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45670 2022-05-18T04:31:34.1888964Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45671 2022-05-18T04:31:35.0831068Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_53se_23 2022-05-18T04:31:35.0832085Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_53se_23/_remote_module_non_scriptable.py 2022-05-18T04:31:35.0846945Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8nmpakxj 2022-05-18T04:31:35.0850342Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8nmpakxj/_remote_module_non_scriptable.py 2022-05-18T04:31:35.1047909Z dist init r=1, world=2 2022-05-18T04:31:35.1052085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:35.1090831Z dist init r=0, world=2 2022-05-18T04:31:35.1095659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:35.1096792Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:35.1155648Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:36.4797260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:36.4797834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:37.2643381Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:37.2643928Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:37.2906004Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:37.2906933Z warnings.warn( 2022-05-18T04:31:37.2922001Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:37.2922558Z warnings.warn( 2022-05-18T04:31:37.3656545Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:37.3657219Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:37.3668714Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:37.3669371Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:37.4249149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:37.4249648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:37.8982990Z ok (3.733s) 2022-05-18T04:31:37.9114099Z test_transformer_parameterized_offload_false_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45753 2022-05-18T04:31:37.9220303Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45754 2022-05-18T04:31:38.8219162Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptvl3pgb2 2022-05-18T04:31:38.8220842Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptvl3pgb2/_remote_module_non_scriptable.py 2022-05-18T04:31:38.8455177Z dist init r=1, world=2 2022-05-18T04:31:38.8459540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:38.8594503Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr2ffcloa 2022-05-18T04:31:38.8597281Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr2ffcloa/_remote_module_non_scriptable.py 2022-05-18T04:31:38.8810371Z dist init r=0, world=2 2022-05-18T04:31:38.8814444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:38.8815813Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:38.8867936Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:40.2371320Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:40.2371861Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:41.0161316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:41.0188555Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:41.0428692Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:41.0429267Z warnings.warn( 2022-05-18T04:31:41.0471961Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:41.0472516Z warnings.warn( 2022-05-18T04:31:41.1178279Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:41.1178960Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:41.1189117Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:41.1189782Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:41.1764428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:41.1768176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:41.6315185Z ok (3.733s) 2022-05-18T04:31:41.6449254Z test_transformer_parameterized_offload_false_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45836 2022-05-18T04:31:41.6557344Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45837 2022-05-18T04:31:42.5976377Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6d9xqv0w 2022-05-18T04:31:42.5977197Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6d9xqv0w/_remote_module_non_scriptable.py 2022-05-18T04:31:42.6117791Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcejj4z1x 2022-05-18T04:31:42.6120483Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcejj4z1x/_remote_module_non_scriptable.py 2022-05-18T04:31:42.6200055Z dist init r=1, world=2 2022-05-18T04:31:42.6204459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:42.6342914Z dist init r=0, world=2 2022-05-18T04:31:42.6347619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:42.6348435Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:42.6409588Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:43.9864605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:43.9865209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:44.7661223Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:44.7683308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:44.7932142Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:44.7932729Z warnings.warn( 2022-05-18T04:31:44.7950628Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:44.7951171Z warnings.warn( 2022-05-18T04:31:44.8645643Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:44.8646313Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:44.8652751Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:44.8653427Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:44.9219980Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:44.9220487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:45.3652851Z ok (3.734s) 2022-05-18T04:31:45.3789319Z test_transformer_parameterized_offload_false_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45919 2022-05-18T04:31:45.3894346Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45920 2022-05-18T04:31:46.2928211Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyutk7iar 2022-05-18T04:31:46.2930078Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyutk7iar/_remote_module_non_scriptable.py 2022-05-18T04:31:46.3167281Z dist init r=1, world=2 2022-05-18T04:31:46.3171970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:46.3298566Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb9namv5v 2022-05-18T04:31:46.3301296Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb9namv5v/_remote_module_non_scriptable.py 2022-05-18T04:31:46.3516259Z dist init r=0, world=2 2022-05-18T04:31:46.3520569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:46.3521368Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:46.3581318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:47.7074792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:47.7075321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:48.4873632Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:48.4897746Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:48.5142636Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:48.5143232Z warnings.warn( 2022-05-18T04:31:48.5172127Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:48.5172674Z warnings.warn( 2022-05-18T04:31:48.5910698Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:48.5911381Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:48.5921622Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:48.5922299Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:48.6482760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:48.6486354Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:49.0988558Z ok (3.733s) 2022-05-18T04:31:49.1123315Z test_transformer_parameterized_offload_false_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46002 2022-05-18T04:31:49.1229725Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46003 2022-05-18T04:31:50.0215363Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg4vdjfhv 2022-05-18T04:31:50.0216418Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg4vdjfhv/_remote_module_non_scriptable.py 2022-05-18T04:31:50.0437519Z dist init r=1, world=2 2022-05-18T04:31:50.0441038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplz9_2euh 2022-05-18T04:31:50.0441592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:50.0444024Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplz9_2euh/_remote_module_non_scriptable.py 2022-05-18T04:31:50.0657859Z dist init r=0, world=2 2022-05-18T04:31:50.0662206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:50.0663205Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:50.0748010Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:51.4098291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:51.4098953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:52.1915065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:52.1939463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:52.2175033Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:52.2175845Z warnings.warn( 2022-05-18T04:31:52.2215279Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:52.2216062Z warnings.warn( 2022-05-18T04:31:52.2934487Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:52.2935322Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:52.2947808Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:52.2948726Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:52.3510090Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:52.3524679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:52.8323628Z ok (3.733s) 2022-05-18T04:31:52.8455552Z test_transformer_parameterized_offload_false_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46085 2022-05-18T04:31:52.8560723Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46086 2022-05-18T04:31:53.7400669Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprwdzq6cf 2022-05-18T04:31:53.7401776Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprwdzq6cf/_remote_module_non_scriptable.py 2022-05-18T04:31:53.7585493Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2x3k8l20 2022-05-18T04:31:53.7588388Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2x3k8l20/_remote_module_non_scriptable.py 2022-05-18T04:31:53.7623480Z dist init r=1, world=2 2022-05-18T04:31:53.7628163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:53.7801727Z dist init r=0, world=2 2022-05-18T04:31:53.7805777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:53.7806835Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:53.7833314Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:55.1193394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:55.1193936Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:55.8991949Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:55.9015083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:55.9257338Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:55.9257926Z warnings.warn( 2022-05-18T04:31:55.9287387Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:55.9287948Z warnings.warn( 2022-05-18T04:31:56.0002193Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:56.0002902Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:56.0016148Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:56.0016816Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:56.0576686Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:56.0577384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:56.5653654Z ok (3.733s) 2022-05-18T04:31:56.5786235Z test_transformer_parameterized_offload_false_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46168 2022-05-18T04:31:56.5893449Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46169 2022-05-18T04:31:57.4844430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuhhm0l1g 2022-05-18T04:31:57.4846028Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuhhm0l1g/_remote_module_non_scriptable.py 2022-05-18T04:31:57.5071307Z dist init r=0, world=2 2022-05-18T04:31:57.5075398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:31:57.5117668Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgxv0hwt5 2022-05-18T04:31:57.5120276Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgxv0hwt5/_remote_module_non_scriptable.py 2022-05-18T04:31:57.5331256Z dist init r=1, world=2 2022-05-18T04:31:57.5335931Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:31:57.5336998Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:57.5382973Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:31:58.8855724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:58.8856282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:59.6723881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:59.6724425Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:59.6986385Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:59.6986948Z warnings.warn( 2022-05-18T04:31:59.6996320Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:31:59.6996875Z warnings.warn( 2022-05-18T04:31:59.7730560Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:59.7731230Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:59.7739612Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:31:59.7740327Z warnings.warn(msg, FutureWarning) 2022-05-18T04:31:59.8301424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:31:59.8301933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:00.2986856Z ok (3.733s) 2022-05-18T04:32:00.3121189Z test_transformer_parameterized_offload_false_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46251 2022-05-18T04:32:00.3226391Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46252 2022-05-18T04:32:01.2160236Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo83rsdv_ 2022-05-18T04:32:01.2161482Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo83rsdv_/_remote_module_non_scriptable.py 2022-05-18T04:32:01.2385375Z dist init r=1, world=2 2022-05-18T04:32:01.2390052Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:01.2476257Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7v6vvew7 2022-05-18T04:32:01.2479345Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7v6vvew7/_remote_module_non_scriptable.py 2022-05-18T04:32:01.2696749Z dist init r=0, world=2 2022-05-18T04:32:01.2701521Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:01.2702377Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:01.2799038Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:02.6366825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:02.6367601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:03.4148133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:03.4148662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:03.4410648Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:03.4411218Z warnings.warn( 2022-05-18T04:32:03.4417066Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:03.4417597Z warnings.warn( 2022-05-18T04:32:03.5101899Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:03.5102599Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:03.5106738Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:03.5107401Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:03.5653994Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:03.5656762Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:04.0321478Z ok (3.733s) 2022-05-18T04:32:04.0455216Z test_transformer_parameterized_offload_false_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46334 2022-05-18T04:32:04.0559978Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46335 2022-05-18T04:32:04.9595178Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc858mk5h 2022-05-18T04:32:04.9596250Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc858mk5h/_remote_module_non_scriptable.py 2022-05-18T04:32:04.9810319Z dist init r=1, world=2 2022-05-18T04:32:04.9814288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:05.0171763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp16dj_89 2022-05-18T04:32:05.0174805Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp16dj_89/_remote_module_non_scriptable.py 2022-05-18T04:32:05.0397967Z dist init r=0, world=2 2022-05-18T04:32:05.0402758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:05.0403558Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:05.0425785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:06.4072516Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:06.4073047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:07.1936760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:07.1937316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:07.2198752Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:07.2199335Z warnings.warn( 2022-05-18T04:32:07.2204544Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:07.2205115Z warnings.warn( 2022-05-18T04:32:07.2885475Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:07.2886157Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:07.2901255Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:07.2901919Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:07.3461773Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:07.3462487Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:07.7653114Z ok (3.733s) 2022-05-18T04:32:07.7787352Z test_transformer_parameterized_offload_false_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46417 2022-05-18T04:32:07.7892807Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46418 2022-05-18T04:32:08.6937555Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3cuswwjn 2022-05-18T04:32:08.6938664Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3cuswwjn/_remote_module_non_scriptable.py 2022-05-18T04:32:08.7047790Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkf6l4wvj 2022-05-18T04:32:08.7050383Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkf6l4wvj/_remote_module_non_scriptable.py 2022-05-18T04:32:08.7153776Z dist init r=1, world=2 2022-05-18T04:32:08.7158218Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:08.7265726Z dist init r=0, world=2 2022-05-18T04:32:08.7270266Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:08.7271731Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:08.7363860Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:10.0569810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:10.0570586Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:10.8288943Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:10.8289772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:10.8546467Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:10.8547111Z warnings.warn( 2022-05-18T04:32:10.8549911Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:10.8550448Z warnings.warn( 2022-05-18T04:32:10.9256434Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:10.9257463Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:10.9259248Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:10.9259917Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:10.9797229Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:10.9798261Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:11.3994213Z ok (3.634s) 2022-05-18T04:32:11.4127765Z test_transformer_parameterized_offload_false_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46500 2022-05-18T04:32:11.4235579Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46501 2022-05-18T04:32:12.3254313Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1jus0ny8 2022-05-18T04:32:12.3255573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1jus0ny8/_remote_module_non_scriptable.py 2022-05-18T04:32:12.3265800Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa81orlav 2022-05-18T04:32:12.3269171Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa81orlav/_remote_module_non_scriptable.py 2022-05-18T04:32:12.3469261Z dist init r=0, world=2 2022-05-18T04:32:12.3473618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:12.3489183Z dist init r=1, world=2 2022-05-18T04:32:12.3493555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:12.3495104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:12.3577407Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:13.7063215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:13.7064203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:14.4920004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:14.4940110Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:14.5181190Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:14.5182346Z warnings.warn( 2022-05-18T04:32:14.5210625Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:14.5211436Z warnings.warn( 2022-05-18T04:32:14.5932497Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:14.5933438Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:14.5946341Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:14.5947297Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:14.6500151Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:14.6507040Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:15.1327745Z ok (3.733s) 2022-05-18T04:32:15.1459372Z test_transformer_parameterized_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46583 2022-05-18T04:32:15.1565549Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46584 2022-05-18T04:32:16.0948747Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx4_5m0d4 2022-05-18T04:32:16.0949886Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx4_5m0d4/_remote_module_non_scriptable.py 2022-05-18T04:32:16.1165846Z dist init r=1, world=2 2022-05-18T04:32:16.1170487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:16.1171012Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg95oh1ec 2022-05-18T04:32:16.1173930Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg95oh1ec/_remote_module_non_scriptable.py 2022-05-18T04:32:16.1394434Z dist init r=0, world=2 2022-05-18T04:32:16.1398803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:16.1399615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:16.1477135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:17.4706267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:17.4706812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:18.2848780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:18.2849310Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:18.3111952Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:18.3112521Z warnings.warn( 2022-05-18T04:32:18.3113284Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:18.3114131Z warnings.warn( 2022-05-18T04:32:18.3831489Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:18.3832172Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:18.3840669Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:18.3841341Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:18.4391738Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:18.4392243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:18.8659890Z ok (3.733s) 2022-05-18T04:32:18.8791430Z test_transformer_parameterized_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46666 2022-05-18T04:32:18.8895932Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46667 2022-05-18T04:32:19.7880898Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpulej8x0d 2022-05-18T04:32:19.7882429Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpulej8x0d/_remote_module_non_scriptable.py 2022-05-18T04:32:19.8105558Z dist init r=1, world=2 2022-05-18T04:32:19.8110214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:19.8233153Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphifwqzgf 2022-05-18T04:32:19.8235860Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphifwqzgf/_remote_module_non_scriptable.py 2022-05-18T04:32:19.8449326Z dist init r=0, world=2 2022-05-18T04:32:19.8453446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:19.8454578Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:19.8518637Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:21.2031124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:21.2031741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:21.9815630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:21.9818489Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:22.0084439Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:22.0085026Z warnings.warn( 2022-05-18T04:32:22.0091732Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:22.0092293Z warnings.warn( 2022-05-18T04:32:22.0818568Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:22.0819266Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:22.0829612Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:22.0830295Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:22.1397894Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:22.1398924Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:22.5990627Z ok (3.733s) 2022-05-18T04:32:22.6124690Z test_transformer_parameterized_offload_true_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46749 2022-05-18T04:32:22.6231572Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46750 2022-05-18T04:32:23.5193753Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplq6sa8m1 2022-05-18T04:32:23.5194927Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplq6sa8m1/_remote_module_non_scriptable.py 2022-05-18T04:32:23.5267077Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcylfvynd 2022-05-18T04:32:23.5269480Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcylfvynd/_remote_module_non_scriptable.py 2022-05-18T04:32:23.5418364Z dist init r=1, world=2 2022-05-18T04:32:23.5422585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:23.5483717Z dist init r=0, world=2 2022-05-18T04:32:23.5487702Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:23.5488825Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:23.5525982Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:24.8917438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:24.8917968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:25.6684865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:25.6711377Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:25.6946572Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:25.6947158Z warnings.warn( 2022-05-18T04:32:25.6983453Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:25.6983986Z warnings.warn( 2022-05-18T04:32:25.7035403Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:25.7078501Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:25.7360609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:25.7371316Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:25.8335871Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:25.8336539Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:25.8354318Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:25.8354986Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:25.8915852Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:25.8929468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:26.3326995Z ok (3.733s) 2022-05-18T04:32:26.3461621Z test_transformer_parameterized_offload_true_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46832 2022-05-18T04:32:26.3571269Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46833 2022-05-18T04:32:27.2549542Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppzqod2d_ 2022-05-18T04:32:27.2550552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppzqod2d_/_remote_module_non_scriptable.py 2022-05-18T04:32:27.2766015Z dist init r=1, world=2 2022-05-18T04:32:27.2770101Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:27.3075561Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphzd4p6j6 2022-05-18T04:32:27.3078242Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphzd4p6j6/_remote_module_non_scriptable.py 2022-05-18T04:32:27.3291690Z dist init r=0, world=2 2022-05-18T04:32:27.3295978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:27.3297085Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:27.3382338Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:28.6433032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:28.6433558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:29.4179613Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:29.4180175Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:29.4441689Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:29.4442256Z warnings.warn( 2022-05-18T04:32:29.4443041Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:29.4443586Z warnings.warn( 2022-05-18T04:32:29.4532501Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:29.4534209Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:29.4816169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:29.4816669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:29.5758564Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:29.5759259Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:29.5760197Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:29.5760845Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:29.6301284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:29.6303518Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:30.0670797Z ok (3.734s) 2022-05-18T04:32:30.0803961Z test_transformer_parameterized_offload_true_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46915 2022-05-18T04:32:30.0910015Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46916 2022-05-18T04:32:31.0161799Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz3grkw6j 2022-05-18T04:32:31.0162918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz3grkw6j/_remote_module_non_scriptable.py 2022-05-18T04:32:31.0358719Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl7k5kg65 2022-05-18T04:32:31.0361071Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl7k5kg65/_remote_module_non_scriptable.py 2022-05-18T04:32:31.0377188Z dist init r=1, world=2 2022-05-18T04:32:31.0381587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:31.0574089Z dist init r=0, world=2 2022-05-18T04:32:31.0578071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:31.0579324Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:31.0586429Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:32.4004492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:32.4005018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:33.1687877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:33.1688439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:33.1946602Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:33.1947173Z warnings.warn( 2022-05-18T04:32:33.1954184Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:33.1955093Z warnings.warn( 2022-05-18T04:32:33.2039240Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:33.2046188Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:33.2323378Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:33.2323892Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:33.3287557Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:33.3288238Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:33.3289177Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:33.3289807Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:33.3824401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:33.3824918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:33.9007271Z ok (3.833s) 2022-05-18T04:32:33.9141153Z test_transformer_parameterized_offload_true_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46998 2022-05-18T04:32:33.9247352Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46999 2022-05-18T04:32:34.8250199Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi55r30jd 2022-05-18T04:32:34.8255683Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi55r30jd/_remote_module_non_scriptable.py 2022-05-18T04:32:34.8471644Z dist init r=0, world=2 2022-05-18T04:32:34.8475883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:34.8623306Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj0ynkozc 2022-05-18T04:32:34.8626660Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj0ynkozc/_remote_module_non_scriptable.py 2022-05-18T04:32:34.8846832Z dist init r=1, world=2 2022-05-18T04:32:34.8851885Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:34.8852728Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:34.8883892Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:36.2188839Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:36.2189361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:37.0016814Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:37.0040778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:37.0283517Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:37.0284196Z warnings.warn( 2022-05-18T04:32:37.0311281Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:37.0311820Z warnings.warn( 2022-05-18T04:32:37.0378223Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:37.0407690Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:37.0696113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:37.0696753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:37.1691751Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:37.1692501Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:37.1694244Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:37.1694908Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:37.2255620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:37.2256369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:37.7344987Z ok (3.834s) 2022-05-18T04:32:37.7479321Z test_transformer_parameterized_offload_true_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47081 2022-05-18T04:32:37.7583868Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47082 2022-05-18T04:32:38.6966176Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uhqoy27 2022-05-18T04:32:38.6967441Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uhqoy27/_remote_module_non_scriptable.py 2022-05-18T04:32:38.7039812Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpww35ptdc 2022-05-18T04:32:38.7042686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpww35ptdc/_remote_module_non_scriptable.py 2022-05-18T04:32:38.7181960Z dist init r=0, world=2 2022-05-18T04:32:38.7186487Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:38.7264780Z dist init r=1, world=2 2022-05-18T04:32:38.7269282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:38.7270393Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:38.7289759Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:40.0803668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:40.0804207Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:40.8571285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:40.8592897Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:40.8842528Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:40.8843119Z warnings.warn( 2022-05-18T04:32:40.8858728Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:40.8859306Z warnings.warn( 2022-05-18T04:32:40.8935552Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:40.8954783Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:40.9247149Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:40.9247685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:41.0246045Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:41.0246756Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:41.0249983Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:41.0250686Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:41.0817890Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:41.0818413Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:41.5679275Z ok (3.833s) 2022-05-18T04:32:41.5812987Z test_transformer_parameterized_offload_true_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47164 2022-05-18T04:32:41.5921139Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47165 2022-05-18T04:32:42.4809586Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvfmskq28 2022-05-18T04:32:42.4810443Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvfmskq28/_remote_module_non_scriptable.py 2022-05-18T04:32:42.5032362Z dist init r=0, world=2 2022-05-18T04:32:42.5036814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:42.5316352Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjyt0mtje 2022-05-18T04:32:42.5319114Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjyt0mtje/_remote_module_non_scriptable.py 2022-05-18T04:32:42.5533830Z dist init r=1, world=2 2022-05-18T04:32:42.5538146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:42.5539296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:42.5546933Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:43.9033447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:43.9033972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:44.6855158Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:44.6855736Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:44.7124858Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:44.7125416Z warnings.warn( 2022-05-18T04:32:44.7129634Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:44.7130175Z warnings.warn( 2022-05-18T04:32:44.7218878Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:44.7225688Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:44.7518848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:44.7519642Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:44.8522998Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:44.8523701Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:44.8530767Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:44.8531438Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:44.9100374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:44.9100890Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:45.4016534Z ok (3.834s) 2022-05-18T04:32:45.4150076Z test_transformer_parameterized_offload_true_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47247 2022-05-18T04:32:45.4255064Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47248 2022-05-18T04:32:46.3228763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2a5eh8da 2022-05-18T04:32:46.3229642Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2a5eh8da/_remote_module_non_scriptable.py 2022-05-18T04:32:46.3259042Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt_jhyhjc 2022-05-18T04:32:46.3261857Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt_jhyhjc/_remote_module_non_scriptable.py 2022-05-18T04:32:46.3443253Z dist init r=0, world=2 2022-05-18T04:32:46.3447305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:46.3485145Z dist init r=1, world=2 2022-05-18T04:32:46.3489354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:46.3490522Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:46.3550784Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:47.6953463Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:47.6954002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:48.4674426Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:48.4675503Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:48.4937753Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:48.4938844Z warnings.warn( 2022-05-18T04:32:48.4940896Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:48.4941969Z warnings.warn( 2022-05-18T04:32:48.5031307Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:48.5034063Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:48.5311864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:48.5321093Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:48.6276200Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:48.6277613Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:48.6280895Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:48.6282299Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:48.6837280Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:48.6845134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:49.1348599Z ok (3.733s) 2022-05-18T04:32:49.1482460Z test_transformer_parameterized_offload_true_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47330 2022-05-18T04:32:49.1588699Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47331 2022-05-18T04:32:50.0723671Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfnftlq1r 2022-05-18T04:32:50.0724663Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfnftlq1r/_remote_module_non_scriptable.py 2022-05-18T04:32:50.0926234Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_5vbwdje 2022-05-18T04:32:50.0928620Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_5vbwdje/_remote_module_non_scriptable.py 2022-05-18T04:32:50.0938035Z dist init r=0, world=2 2022-05-18T04:32:50.0942086Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:50.1140421Z dist init r=1, world=2 2022-05-18T04:32:50.1144552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:50.1145685Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:50.1147017Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:51.4557453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:51.4558026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:52.2302432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:52.2324818Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:52.2566911Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:52.2567486Z warnings.warn( 2022-05-18T04:32:52.2586653Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:52.2587210Z warnings.warn( 2022-05-18T04:32:52.2657494Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:52.2677445Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:52.2960285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:52.2960800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:52.3909105Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:52.3909928Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:52.3911035Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:52.3911700Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:52.4455079Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:52.4455633Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:52.8683564Z ok (3.733s) 2022-05-18T04:32:52.8816289Z test_transformer_parameterized_offload_true_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47413 2022-05-18T04:32:52.8922274Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47414 2022-05-18T04:32:53.7915531Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2hmur4z1 2022-05-18T04:32:53.7916777Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2hmur4z1/_remote_module_non_scriptable.py 2022-05-18T04:32:53.8139301Z dist init r=1, world=2 2022-05-18T04:32:53.8143628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:53.8192274Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpru7_z11y 2022-05-18T04:32:53.8195092Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpru7_z11y/_remote_module_non_scriptable.py 2022-05-18T04:32:53.8409233Z dist init r=0, world=2 2022-05-18T04:32:53.8413719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:53.8414917Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:53.8450710Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:55.1851394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:55.1851921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:55.9648064Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:55.9669137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:55.9909507Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:55.9910094Z warnings.warn( 2022-05-18T04:32:55.9940567Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:55.9941097Z warnings.warn( 2022-05-18T04:32:56.0002388Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:56.0038189Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:56.0321127Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:56.0329442Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:56.1322996Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:56.1323692Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:56.1333401Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:56.1334072Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:56.1885427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:56.1894759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:56.7018094Z ok (3.833s) 2022-05-18T04:32:56.7153985Z test_transformer_parameterized_offload_true_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47496 2022-05-18T04:32:56.7262300Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47497 2022-05-18T04:32:57.6661121Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwlinltyo 2022-05-18T04:32:57.6662053Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbpxilq_7 2022-05-18T04:32:57.6664253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwlinltyo/_remote_module_non_scriptable.py 2022-05-18T04:32:57.6665348Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbpxilq_7/_remote_module_non_scriptable.py 2022-05-18T04:32:57.6881342Z dist init r=0, world=2 2022-05-18T04:32:57.6884389Z dist init r=1, world=2 2022-05-18T04:32:57.6885086Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:32:57.6888596Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:32:57.6889390Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:57.6989606Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:32:59.0382518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:59.0383028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:59.8167189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:59.8173012Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:59.8429018Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:59.8429683Z warnings.warn( 2022-05-18T04:32:59.8439322Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:32:59.8439876Z warnings.warn( 2022-05-18T04:32:59.8521308Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:59.8533515Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:32:59.8810835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:59.8817755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:32:59.9799971Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:59.9800673Z warnings.warn(msg, FutureWarning) 2022-05-18T04:32:59.9812726Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:32:59.9813402Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:00.0357569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:00.0362869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:00.5358897Z ok (3.834s) 2022-05-18T04:33:00.5492890Z test_transformer_parameterized_offload_true_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47579 2022-05-18T04:33:00.5598288Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47580 2022-05-18T04:33:01.4512456Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_pgd8h4v 2022-05-18T04:33:01.4513576Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_pgd8h4v/_remote_module_non_scriptable.py 2022-05-18T04:33:01.4729275Z dist init r=0, world=2 2022-05-18T04:33:01.4733577Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:01.4876826Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyhdxhzug 2022-05-18T04:33:01.4879277Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyhdxhzug/_remote_module_non_scriptable.py 2022-05-18T04:33:01.5097322Z dist init r=1, world=2 2022-05-18T04:33:01.5102143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:01.5103134Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:01.5142219Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:02.8515885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:02.8516405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:03.6287622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:03.6288268Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:03.6553010Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:03.6554061Z warnings.warn( 2022-05-18T04:33:03.6556378Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:03.6556930Z warnings.warn( 2022-05-18T04:33:03.6649266Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:03.6650678Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:03.6940016Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:03.6940563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:03.7940017Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:03.7940699Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:03.7947523Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:03.7948196Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:03.8504926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:03.8505459Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:04.3695057Z ok (3.833s) 2022-05-18T04:33:04.3828829Z test_transformer_parameterized_offload_true_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47662 2022-05-18T04:33:04.3933820Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47663 2022-05-18T04:33:05.2874812Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb3mm5yyc 2022-05-18T04:33:05.2875721Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb3mm5yyc/_remote_module_non_scriptable.py 2022-05-18T04:33:05.2895067Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ra_e14f 2022-05-18T04:33:05.2898214Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ra_e14f/_remote_module_non_scriptable.py 2022-05-18T04:33:05.3090467Z dist init r=1, world=2 2022-05-18T04:33:05.3094554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:05.3121892Z dist init r=0, world=2 2022-05-18T04:33:05.3126253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:05.3127181Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:05.3198412Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:06.6548532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:06.6549040Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:07.4417171Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:07.4417717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:07.4680841Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:07.4681391Z warnings.warn( 2022-05-18T04:33:07.4690115Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:07.4690658Z warnings.warn( 2022-05-18T04:33:07.4774584Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:07.4786457Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:07.5081049Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:07.5081535Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:07.6087116Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:07.6087793Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:07.6103767Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:07.6104659Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:07.6673611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:07.6674155Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:08.1029333Z ok (3.733s) 2022-05-18T04:33:08.1164663Z test_transformer_parameterized_offload_true_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47745 2022-05-18T04:33:08.1269675Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47746 2022-05-18T04:33:09.0226716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_qqiz4oe 2022-05-18T04:33:09.0228666Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_qqiz4oe/_remote_module_non_scriptable.py 2022-05-18T04:33:09.0252915Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0vuf8agd 2022-05-18T04:33:09.0255813Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0vuf8agd/_remote_module_non_scriptable.py 2022-05-18T04:33:09.0444669Z dist init r=1, world=2 2022-05-18T04:33:09.0448834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:09.0476303Z dist init r=0, world=2 2022-05-18T04:33:09.0480400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:09.0481457Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:09.0552406Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:10.3981813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:10.3982382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:11.1835448Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:11.1836033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:11.2102716Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:11.2103307Z warnings.warn( 2022-05-18T04:33:11.2106097Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:11.2106633Z warnings.warn( 2022-05-18T04:33:11.2196082Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:11.2200112Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:11.2492878Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:11.2493385Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:11.3471681Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:11.3472373Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:11.3473320Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:11.3473993Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:11.4037370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:11.4037860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:11.8365115Z ok (3.733s) 2022-05-18T04:33:11.8499868Z test_transformer_parameterized_offload_true_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47828 2022-05-18T04:33:11.8608450Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47829 2022-05-18T04:33:12.7489934Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0rtuxh30 2022-05-18T04:33:12.7491187Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0rtuxh30/_remote_module_non_scriptable.py 2022-05-18T04:33:12.7518002Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe8eoczjn 2022-05-18T04:33:12.7520469Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe8eoczjn/_remote_module_non_scriptable.py 2022-05-18T04:33:12.7705725Z dist init r=0, world=2 2022-05-18T04:33:12.7710068Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:12.7749225Z dist init r=1, world=2 2022-05-18T04:33:12.7754131Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:12.7755564Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:12.7814268Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:14.1169996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:14.1170949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:14.8875782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:14.8876860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:14.9136571Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:14.9137659Z warnings.warn( 2022-05-18T04:33:14.9140507Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:14.9141526Z warnings.warn( 2022-05-18T04:33:14.9226863Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:14.9235002Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:14.9514300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:14.9520591Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:15.0465432Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:15.0466853Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:15.0468821Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:15.0470191Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:15.1006473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:15.1011432Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:15.5702205Z ok (3.734s) 2022-05-18T04:33:15.5836400Z test_transformer_parameterized_offload_true_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47911 2022-05-18T04:33:15.5941745Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47912 2022-05-18T04:33:16.4808810Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa5ibljy2 2022-05-18T04:33:16.4810116Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa5ibljy2/_remote_module_non_scriptable.py 2022-05-18T04:33:16.5024928Z dist init r=1, world=2 2022-05-18T04:33:16.5029169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:16.5126214Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwbvhl8zi 2022-05-18T04:33:16.5128881Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwbvhl8zi/_remote_module_non_scriptable.py 2022-05-18T04:33:16.5339529Z dist init r=0, world=2 2022-05-18T04:33:16.5343613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:16.5344856Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:16.5437590Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:17.8661423Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:17.8661945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:18.6492004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:18.6514926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:18.6753081Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:18.6753734Z warnings.warn( 2022-05-18T04:33:18.6783408Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:18.6784234Z warnings.warn( 2022-05-18T04:33:18.6844738Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:18.6879573Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:18.7163147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:18.7165684Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:18.8158191Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:18.8158895Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:18.8166146Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:18.8167060Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:18.8720751Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:18.8724438Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:19.3035737Z ok (3.733s) 2022-05-18T04:33:19.3169911Z test_transformer_parameterized_offload_true_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47994 2022-05-18T04:33:19.3275264Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47995 2022-05-18T04:33:20.2286632Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3gtnucje 2022-05-18T04:33:20.2287919Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3gtnucje/_remote_module_non_scriptable.py 2022-05-18T04:33:20.2317750Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpds2fc5t9 2022-05-18T04:33:20.2320509Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpds2fc5t9/_remote_module_non_scriptable.py 2022-05-18T04:33:20.2504574Z dist init r=0, world=2 2022-05-18T04:33:20.2508854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:20.2541682Z dist init r=1, world=2 2022-05-18T04:33:20.2546625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:20.2547763Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:20.2612338Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:21.5936659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:21.5937182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:22.3713076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:22.3733489Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:22.3978934Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:22.3979845Z warnings.warn( 2022-05-18T04:33:22.4002987Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:22.4003520Z warnings.warn( 2022-05-18T04:33:22.4070847Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:22.4099870Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:22.4387804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:22.4388313Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:22.5384008Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:22.5385245Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:22.5388580Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:22.5389260Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:22.5944250Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:22.5944764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:23.0371199Z ok (3.733s) 2022-05-18T04:33:23.0504961Z test_transformer_parameterized_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48077 2022-05-18T04:33:23.0609580Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48078 2022-05-18T04:33:23.9551223Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0lq1_s7z 2022-05-18T04:33:23.9552328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0lq1_s7z/_remote_module_non_scriptable.py 2022-05-18T04:33:23.9768327Z dist init r=0, world=2 2022-05-18T04:33:23.9772784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:23.9991738Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp803wi88y 2022-05-18T04:33:23.9994457Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp803wi88y/_remote_module_non_scriptable.py 2022-05-18T04:33:24.0215620Z dist init r=1, world=2 2022-05-18T04:33:24.0219771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:24.0220758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:24.0283088Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:25.3475144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:25.3475679Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:26.1319251Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:26.1339794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:26.1579358Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:26.1579927Z warnings.warn( 2022-05-18T04:33:26.1609863Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:26.1610411Z warnings.warn( 2022-05-18T04:33:26.1670965Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:26.1707024Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:26.1987992Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:26.1995802Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:26.2992222Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:26.2992929Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:26.3008583Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:26.3009259Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:26.3565103Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:26.3574140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:26.8705739Z ok (3.833s) 2022-05-18T04:33:26.8839607Z test_transformer_parameterized_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48160 2022-05-18T04:33:26.8946924Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48161 2022-05-18T04:33:27.7818500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6zi61e4r 2022-05-18T04:33:27.7819719Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6zi61e4r/_remote_module_non_scriptable.py 2022-05-18T04:33:27.7963261Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplodcjh_t 2022-05-18T04:33:27.7965888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplodcjh_t/_remote_module_non_scriptable.py 2022-05-18T04:33:27.8044265Z dist init r=1, world=2 2022-05-18T04:33:27.8048676Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:27.8179502Z dist init r=0, world=2 2022-05-18T04:33:27.8183459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:27.8184955Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:27.8254193Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:29.1814307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:29.1814869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:29.9602592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:29.9627143Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:29.9872376Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:29.9872947Z warnings.warn( 2022-05-18T04:33:29.9905115Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:33:29.9905673Z warnings.warn( 2022-05-18T04:33:29.9966108Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:30.0005505Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:33:30.0302428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:30.0303097Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:30.1336635Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:30.1337352Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:30.1350464Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:33:30.1351135Z warnings.warn(msg, FutureWarning) 2022-05-18T04:33:30.1929114Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:30.1932038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:33:30.7040868Z ok (3.833s) 2022-05-18T04:33:30.7041060Z 2022-05-18T04:33:30.7041465Z ---------------------------------------------------------------------- 2022-05-18T04:33:30.7041810Z Ran 203 tests in 723.270s 2022-05-18T04:33:30.7041983Z 2022-05-18T04:33:30.7042078Z OK 2022-05-18T04:33:30.7042197Z 2022-05-18T04:33:30.7042328Z Generating XML reports... 2022-05-18T04:33:30.7105262Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20220518042127.xml 2022-05-18T04:33:30.7108730Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20220518042127.xml 2022-05-18T04:33:30.7112800Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20220518042127.xml 2022-05-18T04:33:30.7283639Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20220518042127.xml 2022-05-18T04:33:31.0050372Z Running distributed/test_c10d_nccl ... [2022-05-18 04:33:31.004554] 2022-05-18T04:33:31.0051096Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_nccl.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:33:31.004662] 2022-05-18T04:33:31.9355839Z , <__main__.CommTest testMethod=test_broadcast_coalesced_nccl>, <__main__.CommTest testMethod=test_nccl_barrier>, <__main__.CommTest testMethod=test_nccl_barrier_device_ids>, <__main__.CommTest testMethod=test_nccl_barrier_device_ids_function_argument>, <__main__.CommTest testMethod=test_nccl_barrier_timeout>, <__main__.CommTest testMethod=test_nccl_barrier_timeout_new_group>, <__main__.CommTest testMethod=test_nccl_barrier_timeout_new_group_non_member>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_detail>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_info>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_off>, <__main__.CommTest testMethod=test_pass_nccl_options_high_priority_stream>, <__main__.CommTest testMethod=test_sequence_num_incremented_nccl_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_nccl_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_nccl>, <__main__.CommTest testMethod=test_sequence_num_set_nccl_new_group>]> 2022-05-18T04:33:31.9358402Z test_all_reduce_coalesced_nccl (__main__.CommTest) 2022-05-18T04:33:31.9358785Z test_broadcast_coalesced_nccl (__main__.CommTest) 2022-05-18T04:33:31.9359288Z test_nccl_barrier (__main__.CommTest) 2022-05-18T04:33:31.9359862Z test_nccl_barrier_device_ids (__main__.CommTest) 2022-05-18T04:33:31.9360523Z test_nccl_barrier_device_ids_function_argument (__main__.CommTest) 2022-05-18T04:33:31.9361138Z test_nccl_barrier_timeout (__main__.CommTest) 2022-05-18T04:33:31.9361761Z test_nccl_barrier_timeout_new_group (__main__.CommTest) 2022-05-18T04:33:31.9362340Z test_nccl_barrier_timeout_new_group_non_member (__main__.CommTest) 2022-05-18T04:33:31.9362700Z test_nccl_warn_not_in_group_debug_detail (__main__.CommTest) 2022-05-18T04:33:31.9363074Z test_nccl_warn_not_in_group_debug_info (__main__.CommTest) 2022-05-18T04:33:31.9363428Z test_nccl_warn_not_in_group_debug_off (__main__.CommTest) 2022-05-18T04:33:31.9363777Z test_pass_nccl_options_high_priority_stream (__main__.CommTest) 2022-05-18T04:33:31.9364311Z test_sequence_num_incremented_nccl_default (__main__.CommTest) 2022-05-18T04:33:31.9364815Z test_sequence_num_incremented_nccl_subgroup (__main__.CommTest) 2022-05-18T04:33:31.9365183Z test_sequence_num_set_default_pg_nccl (__main__.CommTest) 2022-05-18T04:33:31.9365524Z test_sequence_num_set_nccl_new_group (__main__.CommTest) 2022-05-18T04:33:31.9374690Z , <__main__.DistributedDataParallelTest testMethod=test_accumulate_gradients_module_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_arbitrary_forward_return_value>, <__main__.DistributedDataParallelTest testMethod=test_arbitrary_forward_return_value_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_bf16_compress_wrapper_is_view>, <__main__.DistributedDataParallelTest testMethod=test_bf16_compress_wrapper_nccl>, <__main__.DistributedDataParallelTest testMethod=test_builtin_ddp_comm_hooks_nccl>, <__main__.DistributedDataParallelTest testMethod=test_builtin_ddp_comm_hooks_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_module>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_with_then_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_multi_device_module_config>, <__main__.DistributedDataParallelTest testMethod=test_ddp_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_with_lazy_parameters>, <__main__.DistributedDataParallelTest testMethod=test_default_ddp_comm_hooks_nccl>, <__main__.DistributedDataParallelTest testMethod=test_default_ddp_comm_hooks_nccl_is_view>, <__main__.DistributedDataParallelTest testMethod=test_failure_recovery>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_detail>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_info>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_off>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_detail>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_info>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_off>, <__main__.DistributedDataParallelTest testMethod=test_fp16>, <__main__.DistributedDataParallelTest testMethod=test_fp16_compress_wrapper_is_view>, <__main__.DistributedDataParallelTest testMethod=test_fp16_compress_wrapper_nccl>, <__main__.DistributedDataParallelTest testMethod=test_fp16_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_grad_layout_1devicemodule_1replicaperprocess>, <__main__.DistributedDataParallelTest testMethod=test_grad_layout_2devicemodule>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_multiple_outputs_multiple_backward>, <__main__.DistributedDataParallelTest testMethod=test_multiple_outputs_multiple_backward_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_multi_device_ids_not_allowed>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_multi_device_module_device_ids_None>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_single_device_module_device_ids_None>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_single_device_module_empty_device_ids>, <__main__.DistributedDataParallelTest testMethod=test_nccl_propagate_error_reason>, <__main__.DistributedDataParallelTest testMethod=test_no_grad>, <__main__.DistributedDataParallelTest testMethod=test_param_layout_mismatch_error>, <__main__.DistributedDataParallelTest testMethod=test_pass_default_pg>, <__main__.DistributedDataParallelTest testMethod=test_powerSGD_ddp_comm_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_powerSGD_ddp_comm_hook_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2022-05-18T04:33:31.9383522Z test_accumulate_gradients_module (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9384865Z test_accumulate_gradients_module_with_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9385327Z test_arbitrary_forward_return_value (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9385795Z test_arbitrary_forward_return_value_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9386256Z test_bf16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9386694Z test_bf16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9387216Z test_builtin_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9387717Z test_builtin_ddp_comm_hooks_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9388183Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9388656Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9389125Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9389613Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9390119Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9390646Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9391133Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9391629Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9392109Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9392586Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9393089Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9393602Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9394109Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9394572Z test_ddp_comm_hook_allreduce_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9395041Z test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9395527Z test_ddp_comm_hook_allreduce_hook_nccl_static_graph (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9395985Z test_ddp_comm_hook_allreduce_with_then_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9396453Z test_ddp_comm_hook_future_passing_gpu_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9396904Z test_ddp_multi_device_module_config (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9397335Z test_ddp_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9397738Z test_ddp_with_lazy_parameters (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9398250Z test_default_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9398709Z test_default_ddp_comm_hooks_nccl_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9399122Z test_failure_recovery (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9399565Z test_find_unused_parameters_kwarg_debug_detail (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9400047Z test_find_unused_parameters_kwarg_debug_info (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9400519Z test_find_unused_parameters_kwarg_debug_off (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9400994Z test_find_unused_parameters_kwarg_grad_is_view_debug_detail (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9401501Z test_find_unused_parameters_kwarg_grad_is_view_debug_info (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9401999Z test_find_unused_parameters_kwarg_grad_is_view_debug_off (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9402413Z test_fp16 (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9402820Z test_fp16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9403256Z test_fp16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9403673Z test_fp16_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9404177Z test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9404673Z test_grad_layout_2devicemodule (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9405102Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9405526Z test_multiple_outputs_multiple_backward (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9406004Z test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9406497Z test_nccl_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9406994Z test_nccl_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9407440Z test_nccl_backend_2gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9407867Z test_nccl_backend_4gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9408318Z test_nccl_backend_multi_device_ids_not_allowed (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9408782Z test_nccl_backend_multi_device_module_device_ids_None (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9409274Z test_nccl_backend_single_device_module_device_ids_None (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9409767Z test_nccl_backend_single_device_module_empty_device_ids (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9410229Z test_nccl_propagate_error_reason (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9410613Z test_no_grad (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9411027Z test_param_layout_mismatch_error (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9411444Z test_pass_default_pg (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9411846Z test_powerSGD_ddp_comm_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9412299Z test_powerSGD_ddp_comm_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9412754Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9413175Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T04:33:31.9413559Z 2022-05-18T04:33:31.9414867Z , <__main__.NcclErrorHandlingTest testMethod=test_nccl_blocking_wait_with_barrier>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_abort>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_clean_exit>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_nonzero_exit>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_sigkill>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_sigterm>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_nonblocking>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_timeout>]> 2022-05-18T04:33:31.9416128Z test_invalid_nccl_blocking_wait_env (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9416545Z test_nccl_blocking_wait_with_barrier (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9416934Z test_nccl_errors_blocking_abort (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9417342Z test_nccl_errors_blocking_clean_exit (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9417754Z test_nccl_errors_blocking_nonzero_exit (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9418160Z test_nccl_errors_blocking_sigkill (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9418550Z test_nccl_errors_blocking_sigterm (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9418948Z test_nccl_errors_nonblocking (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9419317Z test_nccl_timeout (__main__.NcclErrorHandlingTest) 2022-05-18T04:33:31.9419758Z ]> 2022-05-18T04:33:31.9420219Z test_init_no_gpus (__main__.ProcessGroupNCCLNoGPUTest) 2022-05-18T04:33:31.9422130Z , <__main__.ProcessGroupNCCLTest testMethod=test_allgather_base_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_allgather_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_allreduce_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_barrier>, <__main__.ProcessGroupNCCLTest testMethod=test_broadcast_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_empty_tensors>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_checks>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_stress>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_base_basics>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_base_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_checks>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_stress>]> 2022-05-18T04:33:31.9424640Z test_allgather_base_basics (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9425013Z test_allgather_base_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9425382Z test_allgather_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9425742Z test_allreduce_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9426074Z test_barrier (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9426428Z test_broadcast_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9426787Z test_empty_tensors (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9427122Z test_gather_checks (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9427477Z test_gather_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9427827Z test_gather_stress (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9428184Z test_reduce_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9428540Z test_reduce_scatter_base_basics (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9428934Z test_reduce_scatter_base_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9429312Z test_reduce_scatter_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9429660Z test_scatter_checks (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9430014Z test_scatter_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9430368Z test_scatter_stress (__main__.ProcessGroupNCCLTest) 2022-05-18T04:33:31.9430870Z ]> 2022-05-18T04:33:31.9431299Z test_common_errors (__main__.RendezvousEnvTest) 2022-05-18T04:33:31.9431625Z 2022-05-18T04:33:31.9432046Z ]> 2022-05-18T04:33:31.9432456Z test_default_store_timeout_nccl (__main__.TimeoutTest) 2022-05-18T04:33:32.8484273Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:33:32.8498621Z 2022-05-18T04:33:32.8499157Z Running tests... 2022-05-18T04:33:32.8499642Z ---------------------------------------------------------------------- 2022-05-18T04:33:34.4364400Z test_all_reduce_coalesced_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:34.4755595Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48311 2022-05-18T04:33:34.4861247Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48312 2022-05-18T04:33:35.3964523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:35.4011618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:36.9936081Z ok (4.143s) 2022-05-18T04:33:36.9936309Z 2022-05-18T04:33:36.9936724Z ---------------------------------------------------------------------- 2022-05-18T04:33:36.9937431Z Ran 1 test in 4.144s 2022-05-18T04:33:36.9937608Z 2022-05-18T04:33:36.9937704Z OK 2022-05-18T04:33:36.9937851Z 2022-05-18T04:33:36.9937990Z Generating XML reports... 2022-05-18T04:33:36.9980494Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043332.xml 2022-05-18T04:33:38.1845817Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:33:38.1859913Z 2022-05-18T04:33:38.1860154Z Running tests... 2022-05-18T04:33:38.1860631Z ---------------------------------------------------------------------- 2022-05-18T04:33:39.7702128Z test_broadcast_coalesced_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:39.8093255Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48427 2022-05-18T04:33:39.8199646Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48428 2022-05-18T04:33:40.7147277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:40.7209741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:42.3273956Z ok (4.141s) 2022-05-18T04:33:42.3274177Z 2022-05-18T04:33:42.3274564Z ---------------------------------------------------------------------- 2022-05-18T04:33:42.3274912Z Ran 1 test in 4.141s 2022-05-18T04:33:42.3275077Z 2022-05-18T04:33:42.3275174Z OK 2022-05-18T04:33:42.3275312Z 2022-05-18T04:33:42.3275428Z Generating XML reports... 2022-05-18T04:33:42.3318004Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043338.xml 2022-05-18T04:33:43.5185153Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:33:43.5199443Z 2022-05-18T04:33:43.5199651Z Running tests... 2022-05-18T04:33:43.5200089Z ---------------------------------------------------------------------- 2022-05-18T04:33:45.0962906Z test_nccl_barrier (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:45.1357022Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48543 2022-05-18T04:33:45.1462910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48544 2022-05-18T04:33:46.0718311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:46.1016971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:46.2506847Z skip: Need at least 4 CUDA devices (2.730s) 2022-05-18T04:33:46.2507559Z 2022-05-18T04:33:46.2507984Z ---------------------------------------------------------------------- 2022-05-18T04:33:46.2508315Z Ran 1 test in 2.731s 2022-05-18T04:33:46.2508484Z 2022-05-18T04:33:46.2508598Z OK (skipped=1) 2022-05-18T04:33:46.2508755Z 2022-05-18T04:33:46.2508882Z Generating XML reports... 2022-05-18T04:33:46.2550019Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043343.xml 2022-05-18T04:33:47.4194658Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:33:47.4208519Z 2022-05-18T04:33:47.4208916Z Running tests... 2022-05-18T04:33:47.4209392Z ---------------------------------------------------------------------- 2022-05-18T04:33:49.0002826Z test_nccl_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:49.0397325Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48646 2022-05-18T04:33:49.0504878Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48647 2022-05-18T04:33:49.9411247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:49.9413868Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:49.9520201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:49.9524149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:49.9525104Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:49.9618897Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:51.5579589Z ok (4.137s) 2022-05-18T04:33:51.5579965Z 2022-05-18T04:33:51.5580644Z ---------------------------------------------------------------------- 2022-05-18T04:33:51.5581279Z Ran 1 test in 4.137s 2022-05-18T04:33:51.5581594Z 2022-05-18T04:33:51.5581757Z OK 2022-05-18T04:33:51.5581992Z 2022-05-18T04:33:51.5582230Z Generating XML reports... 2022-05-18T04:33:51.5624761Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043347.xml 2022-05-18T04:33:52.7517817Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:33:52.7531528Z 2022-05-18T04:33:52.7531943Z Running tests... 2022-05-18T04:33:52.7532608Z ---------------------------------------------------------------------- 2022-05-18T04:33:54.3333818Z test_nccl_barrier_device_ids_function_argument (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:54.3720020Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48762 2022-05-18T04:33:54.3826303Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48763 2022-05-18T04:33:55.2883853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:55.2886307Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:33:55.3231058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:55.3234730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:33:55.3235611Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:55.3294956Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:33:55.4868397Z ok (2.733s) 2022-05-18T04:33:55.4868605Z 2022-05-18T04:33:55.4868996Z ---------------------------------------------------------------------- 2022-05-18T04:33:55.4869322Z Ran 1 test in 2.734s 2022-05-18T04:33:55.4869859Z 2022-05-18T04:33:55.4870032Z OK 2022-05-18T04:33:55.4870216Z 2022-05-18T04:33:55.4870350Z Generating XML reports... 2022-05-18T04:33:55.4912348Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043352.xml 2022-05-18T04:33:56.6567341Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:33:56.6581530Z 2022-05-18T04:33:56.6581832Z Running tests... 2022-05-18T04:33:56.6582485Z ---------------------------------------------------------------------- 2022-05-18T04:33:58.2449061Z test_nccl_barrier_timeout (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:58.2833989Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48869 2022-05-18T04:33:58.2940762Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48870 2022-05-18T04:33:59.1955722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:59.1961040Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:59.3982937Z skip: Need at least 4 CUDA devices (2.740s) 2022-05-18T04:33:59.3983361Z 2022-05-18T04:33:59.3984010Z ---------------------------------------------------------------------- 2022-05-18T04:33:59.3984383Z Ran 1 test in 2.740s 2022-05-18T04:33:59.3984874Z 2022-05-18T04:33:59.3984990Z OK (skipped=1) 2022-05-18T04:33:59.3985156Z 2022-05-18T04:33:59.3985285Z Generating XML reports... 2022-05-18T04:33:59.4026454Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043356.xml 2022-05-18T04:34:00.5763968Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:00.5777617Z 2022-05-18T04:34:00.5777826Z Running tests... 2022-05-18T04:34:00.5778275Z ---------------------------------------------------------------------- 2022-05-18T04:34:02.1569681Z test_nccl_barrier_timeout_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:02.1971074Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48972 2022-05-18T04:34:02.2083704Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48973 2022-05-18T04:34:03.1142905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:03.1279075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:03.3126807Z skip: Need at least 4 CUDA devices (2.734s) 2022-05-18T04:34:03.3127283Z 2022-05-18T04:34:03.3127961Z ---------------------------------------------------------------------- 2022-05-18T04:34:03.3128591Z Ran 1 test in 2.735s 2022-05-18T04:34:03.3128865Z 2022-05-18T04:34:03.3129058Z OK (skipped=1) 2022-05-18T04:34:03.3129350Z 2022-05-18T04:34:03.3129569Z Generating XML reports... 2022-05-18T04:34:03.3172312Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043400.xml 2022-05-18T04:34:04.4947809Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:04.4961967Z 2022-05-18T04:34:04.4962245Z Running tests... 2022-05-18T04:34:04.4962689Z ---------------------------------------------------------------------- 2022-05-18T04:34:06.0694059Z test_nccl_barrier_timeout_new_group_non_member (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:06.1087506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49075 2022-05-18T04:34:06.1193948Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49076 2022-05-18T04:34:07.0591353Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:07.0713219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:07.2236675Z skip: Need at least 4 CUDA devices (2.727s) 2022-05-18T04:34:07.2237390Z 2022-05-18T04:34:07.2237810Z ---------------------------------------------------------------------- 2022-05-18T04:34:07.2238157Z Ran 1 test in 2.727s 2022-05-18T04:34:07.2238325Z 2022-05-18T04:34:07.2238421Z OK (skipped=1) 2022-05-18T04:34:07.2238673Z 2022-05-18T04:34:07.2238906Z Generating XML reports... 2022-05-18T04:34:07.2280353Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043404.xml 2022-05-18T04:34:08.3568486Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:08.3582163Z 2022-05-18T04:34:08.3582406Z Running tests... 2022-05-18T04:34:08.3582827Z ---------------------------------------------------------------------- 2022-05-18T04:34:09.9382566Z test_nccl_warn_not_in_group_debug_detail (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:09.9776807Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49178 2022-05-18T04:34:09.9886074Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49179 2022-05-18T04:34:10.9290661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:10.9397395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:10.9602765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:10.9603592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:10.9604479Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:10.9605214Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:10.9605770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:34:10.9609923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:34:10.9610607Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:10.9708414Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:12.6965399Z ok (4.338s) 2022-05-18T04:34:12.6965619Z 2022-05-18T04:34:12.6966017Z ---------------------------------------------------------------------- 2022-05-18T04:34:12.6966361Z Ran 1 test in 4.338s 2022-05-18T04:34:12.6966530Z 2022-05-18T04:34:12.6966609Z OK 2022-05-18T04:34:12.6966744Z 2022-05-18T04:34:12.6966878Z Generating XML reports... 2022-05-18T04:34:12.7008851Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043408.xml 2022-05-18T04:34:13.8880252Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:13.8893868Z 2022-05-18T04:34:13.8894075Z Running tests... 2022-05-18T04:34:13.8894502Z ---------------------------------------------------------------------- 2022-05-18T04:34:15.4640351Z test_nccl_warn_not_in_group_debug_info (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:15.5032413Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49309 2022-05-18T04:34:15.5138578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49310 2022-05-18T04:34:16.4244935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:16.4247679Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:16.4262928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:16.4268032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:16.4268859Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:16.4271297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:34:16.4350835Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:16.4351590Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:34:16.4352311Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:16.4374288Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:18.1213573Z ok (4.232s) 2022-05-18T04:34:18.1213749Z 2022-05-18T04:34:18.1214129Z ---------------------------------------------------------------------- 2022-05-18T04:34:18.1214476Z Ran 1 test in 4.232s 2022-05-18T04:34:18.1214645Z 2022-05-18T04:34:18.1214742Z OK 2022-05-18T04:34:18.1214875Z 2022-05-18T04:34:18.1215002Z Generating XML reports... 2022-05-18T04:34:18.1257972Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043413.xml 2022-05-18T04:34:19.2986041Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:19.3000617Z 2022-05-18T04:34:19.3001019Z Running tests... 2022-05-18T04:34:19.3001503Z ---------------------------------------------------------------------- 2022-05-18T04:34:20.8805318Z test_nccl_warn_not_in_group_debug_off (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:20.9192889Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49431 2022-05-18T04:34:20.9297862Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49432 2022-05-18T04:34:21.8211011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:21.8213492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:21.8459560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:21.8463410Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:21.8464244Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:21.8466924Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:34:21.8520023Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:21.8532927Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:34:21.8533680Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:21.8570321Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:23.5380668Z ok (4.238s) 2022-05-18T04:34:23.5380878Z 2022-05-18T04:34:23.5381301Z ---------------------------------------------------------------------- 2022-05-18T04:34:23.5381649Z Ran 1 test in 4.238s 2022-05-18T04:34:23.5381818Z 2022-05-18T04:34:23.5381914Z OK 2022-05-18T04:34:23.5382050Z 2022-05-18T04:34:23.5382165Z Generating XML reports... 2022-05-18T04:34:23.5424803Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043419.xml 2022-05-18T04:34:24.7356641Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:24.7370307Z 2022-05-18T04:34:24.7370755Z Running tests... 2022-05-18T04:34:24.7371228Z ---------------------------------------------------------------------- 2022-05-18T04:34:26.3223965Z test_pass_nccl_options_high_priority_stream (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:26.3616916Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49553 2022-05-18T04:34:26.3723771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49554 2022-05-18T04:34:27.2834693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:27.2836940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:27.2947043Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:27.2950851Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:27.2951697Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:27.2954012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:34:27.3041807Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:27.3044279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:34:27.3044982Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:27.3057104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:28.9798733Z ok (4.242s) 2022-05-18T04:34:28.9798958Z 2022-05-18T04:34:28.9799364Z ---------------------------------------------------------------------- 2022-05-18T04:34:28.9799708Z Ran 1 test in 4.243s 2022-05-18T04:34:28.9799893Z 2022-05-18T04:34:28.9801192Z OK 2022-05-18T04:34:28.9801598Z 2022-05-18T04:34:28.9801968Z Generating XML reports... 2022-05-18T04:34:28.9841730Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043424.xml 2022-05-18T04:34:30.1699577Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:30.1713977Z 2022-05-18T04:34:30.1714165Z Running tests... 2022-05-18T04:34:30.1714661Z ---------------------------------------------------------------------- 2022-05-18T04:34:31.7505633Z test_sequence_num_incremented_nccl_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:31.7900171Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49673 2022-05-18T04:34:31.8007411Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49674 2022-05-18T04:34:32.7102981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:32.7111006Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:32.7131242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:32.7140528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:32.7141354Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:32.7213878Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:32.7351852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:34:32.7352602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:34:32.7353532Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:32.7354258Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:34.4083390Z ok (4.237s) 2022-05-18T04:34:34.4083631Z 2022-05-18T04:34:34.4084032Z ---------------------------------------------------------------------- 2022-05-18T04:34:34.4084404Z Ran 1 test in 4.237s 2022-05-18T04:34:34.4084576Z 2022-05-18T04:34:34.4084676Z OK 2022-05-18T04:34:34.4084819Z 2022-05-18T04:34:34.4084953Z Generating XML reports... 2022-05-18T04:34:34.4127673Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043430.xml 2022-05-18T04:34:35.5961659Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:35.5982323Z 2022-05-18T04:34:35.5982791Z Running tests... 2022-05-18T04:34:35.5983333Z ---------------------------------------------------------------------- 2022-05-18T04:34:37.1451275Z test_sequence_num_incremented_nccl_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:37.1843041Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49795 2022-05-18T04:34:37.1949952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49796 2022-05-18T04:34:38.1280713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:38.1498085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:38.2992819Z skip: Need at least 4 CUDA devices (2.701s) 2022-05-18T04:34:38.2993260Z 2022-05-18T04:34:38.2993695Z ---------------------------------------------------------------------- 2022-05-18T04:34:38.2994042Z Ran 1 test in 2.702s 2022-05-18T04:34:38.2994209Z 2022-05-18T04:34:38.2994302Z OK (skipped=1) 2022-05-18T04:34:38.2994461Z 2022-05-18T04:34:38.2994607Z Generating XML reports... 2022-05-18T04:34:38.3036250Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043435.xml 2022-05-18T04:34:39.4706706Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:39.4721337Z 2022-05-18T04:34:39.4721622Z Running tests... 2022-05-18T04:34:39.4722060Z ---------------------------------------------------------------------- 2022-05-18T04:34:41.0593020Z test_sequence_num_set_default_pg_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:41.0976504Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49898 2022-05-18T04:34:41.1083079Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49899 2022-05-18T04:34:42.0119876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:42.0128587Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:42.0157706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:42.0169453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:42.0170518Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:42.0231899Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:43.6155927Z ok (4.143s) 2022-05-18T04:34:43.6156352Z 2022-05-18T04:34:43.6157086Z ---------------------------------------------------------------------- 2022-05-18T04:34:43.6157480Z Ran 1 test in 4.143s 2022-05-18T04:34:43.6157649Z 2022-05-18T04:34:43.6157743Z OK 2022-05-18T04:34:43.6157881Z 2022-05-18T04:34:43.6158017Z Generating XML reports... 2022-05-18T04:34:43.6199909Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043439.xml 2022-05-18T04:34:44.7897253Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:44.7910877Z 2022-05-18T04:34:44.7911135Z Running tests... 2022-05-18T04:34:44.7911555Z ---------------------------------------------------------------------- 2022-05-18T04:34:46.3399191Z test_sequence_num_set_nccl_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:46.3783194Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50014 2022-05-18T04:34:46.3888222Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50015 2022-05-18T04:34:47.3087671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:47.3096530Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:34:47.3224187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:47.3235146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:34:47.3235951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:47.3238490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:34:47.3301452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:34:47.3303670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:34:47.3304332Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:47.3342270Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:34:48.9963304Z ok (4.205s) 2022-05-18T04:34:48.9963571Z 2022-05-18T04:34:48.9964142Z ---------------------------------------------------------------------- 2022-05-18T04:34:48.9964491Z Ran 1 test in 4.205s 2022-05-18T04:34:48.9964658Z 2022-05-18T04:34:48.9964758Z OK 2022-05-18T04:34:48.9964895Z 2022-05-18T04:34:48.9965031Z Generating XML reports... 2022-05-18T04:34:49.0006876Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043444.xml 2022-05-18T04:34:50.1861079Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:50.1876757Z 2022-05-18T04:34:50.1877093Z Running tests... 2022-05-18T04:34:50.1877509Z ---------------------------------------------------------------------- 2022-05-18T04:34:51.7786877Z test_accumulate_gradients_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:51.8181844Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50134 2022-05-18T04:34:51.8289840Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50135 2022-05-18T04:34:52.7417920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:52.7876799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:54.0145554Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprbtovzwl 2022-05-18T04:34:54.0146192Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprbtovzwl/_remote_module_non_scriptable.py 2022-05-18T04:34:54.0486510Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf55__x03 2022-05-18T04:34:54.0488475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf55__x03/_remote_module_non_scriptable.py 2022-05-18T04:34:54.3111060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:34:54.3112024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:34:54.6370689Z ok (4.449s) 2022-05-18T04:34:54.6370875Z 2022-05-18T04:34:54.6371250Z ---------------------------------------------------------------------- 2022-05-18T04:34:54.6371587Z Ran 1 test in 4.449s 2022-05-18T04:34:54.6371749Z 2022-05-18T04:34:54.6371863Z OK 2022-05-18T04:34:54.6371998Z 2022-05-18T04:34:54.6372128Z Generating XML reports... 2022-05-18T04:34:54.6415070Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043450.xml 2022-05-18T04:34:55.8226634Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:34:55.8240074Z 2022-05-18T04:34:55.8240354Z Running tests... 2022-05-18T04:34:55.8240794Z ---------------------------------------------------------------------- 2022-05-18T04:34:57.3741603Z test_accumulate_gradients_module_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:57.4129452Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50254 2022-05-18T04:34:57.4234879Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50255 2022-05-18T04:34:58.3360723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:58.3403077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:59.6392427Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpay3v166f 2022-05-18T04:34:59.6393045Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpay3v166f/_remote_module_non_scriptable.py 2022-05-18T04:34:59.6494206Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz8dsa_ib 2022-05-18T04:34:59.6497558Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz8dsa_ib/_remote_module_non_scriptable.py 2022-05-18T04:34:59.9125914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:34:59.9132522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:00.2316048Z ok (4.407s) 2022-05-18T04:35:00.2316388Z 2022-05-18T04:35:00.2316880Z ---------------------------------------------------------------------- 2022-05-18T04:35:00.2317248Z Ran 1 test in 4.408s 2022-05-18T04:35:00.2317415Z 2022-05-18T04:35:00.2317492Z OK 2022-05-18T04:35:00.2317626Z 2022-05-18T04:35:00.2318491Z Generating XML reports... 2022-05-18T04:35:00.2361808Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043455.xml 2022-05-18T04:35:01.4193782Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:01.4207700Z 2022-05-18T04:35:01.4207942Z Running tests... 2022-05-18T04:35:01.4208400Z ---------------------------------------------------------------------- 2022-05-18T04:35:02.9779093Z test_arbitrary_forward_return_value (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:03.0166407Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50374 2022-05-18T04:35:03.0274203Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50375 2022-05-18T04:35:03.9356333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:03.9693755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:05.2202547Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr3_4fimp 2022-05-18T04:35:05.2203154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr3_4fimp/_remote_module_non_scriptable.py 2022-05-18T04:35:05.2219870Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpji8u_edw 2022-05-18T04:35:05.2223062Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpji8u_edw/_remote_module_non_scriptable.py 2022-05-18T04:35:05.4787375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:05.4787910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:05.8355372Z ok (4.414s) 2022-05-18T04:35:05.8355621Z 2022-05-18T04:35:05.8356035Z ---------------------------------------------------------------------- 2022-05-18T04:35:05.8356376Z Ran 1 test in 4.415s 2022-05-18T04:35:05.8356540Z 2022-05-18T04:35:05.8356619Z OK 2022-05-18T04:35:05.8356764Z 2022-05-18T04:35:05.8356899Z Generating XML reports... 2022-05-18T04:35:05.8398473Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043501.xml 2022-05-18T04:35:07.0223649Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:07.0236882Z 2022-05-18T04:35:07.0237144Z Running tests... 2022-05-18T04:35:07.0237585Z ---------------------------------------------------------------------- 2022-05-18T04:35:08.5628369Z test_arbitrary_forward_return_value_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:08.6015196Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50494 2022-05-18T04:35:08.6121991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50495 2022-05-18T04:35:09.5469983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:09.5804285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:10.8313334Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcoekc50m 2022-05-18T04:35:10.8313950Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcoekc50m/_remote_module_non_scriptable.py 2022-05-18T04:35:10.8614936Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa6ujg9kx 2022-05-18T04:35:10.8617555Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa6ujg9kx/_remote_module_non_scriptable.py 2022-05-18T04:35:11.1113390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:11.1113933Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:11.4200973Z ok (4.396s) 2022-05-18T04:35:11.4201188Z 2022-05-18T04:35:11.4201598Z ---------------------------------------------------------------------- 2022-05-18T04:35:11.4201943Z Ran 1 test in 4.396s 2022-05-18T04:35:11.4202113Z 2022-05-18T04:35:11.4202188Z OK 2022-05-18T04:35:11.4202333Z 2022-05-18T04:35:11.4202465Z Generating XML reports... 2022-05-18T04:35:11.4244774Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043507.xml 2022-05-18T04:35:12.6114543Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:12.6128331Z 2022-05-18T04:35:12.6128771Z Running tests... 2022-05-18T04:35:12.6129287Z ---------------------------------------------------------------------- 2022-05-18T04:35:12.6134941Z test_bf16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) ... skip: BFloat16 is only supported by CUDA 11+ (0.000s) 2022-05-18T04:35:12.6135307Z 2022-05-18T04:35:12.6135597Z ---------------------------------------------------------------------- 2022-05-18T04:35:12.6135945Z Ran 1 test in 0.001s 2022-05-18T04:35:12.6136113Z 2022-05-18T04:35:12.6136225Z OK (skipped=1) 2022-05-18T04:35:12.6136364Z 2022-05-18T04:35:12.6136490Z Generating XML reports... 2022-05-18T04:35:12.6171137Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043512.xml 2022-05-18T04:35:13.6370957Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:13.6384735Z 2022-05-18T04:35:13.6385110Z Running tests... 2022-05-18T04:35:13.6385595Z ---------------------------------------------------------------------- 2022-05-18T04:35:13.6390994Z test_bf16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) ... skip: BFloat16 is only supported by CUDA 11+ (0.001s) 2022-05-18T04:35:13.6391493Z 2022-05-18T04:35:13.6391816Z ---------------------------------------------------------------------- 2022-05-18T04:35:13.6392149Z Ran 1 test in 0.001s 2022-05-18T04:35:13.6392315Z 2022-05-18T04:35:13.6392426Z OK (skipped=1) 2022-05-18T04:35:13.6392589Z 2022-05-18T04:35:13.6392715Z Generating XML reports... 2022-05-18T04:35:13.6427383Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043513.xml 2022-05-18T04:35:14.6665452Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:14.6680706Z 2022-05-18T04:35:14.6681127Z Running tests... 2022-05-18T04:35:14.6681619Z ---------------------------------------------------------------------- 2022-05-18T04:35:16.2689579Z test_builtin_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:16.3081859Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50680 2022-05-18T04:35:16.3188839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50681 2022-05-18T04:35:17.2072468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:17.2210300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:18.4860387Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc6hrlt5s 2022-05-18T04:35:18.4861031Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc6hrlt5s/_remote_module_non_scriptable.py 2022-05-18T04:35:18.5232011Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpef7s9dhb 2022-05-18T04:35:18.5233711Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpef7s9dhb/_remote_module_non_scriptable.py 2022-05-18T04:35:18.9264459Z ok (4.258s) 2022-05-18T04:35:18.9264671Z 2022-05-18T04:35:18.9265072Z ---------------------------------------------------------------------- 2022-05-18T04:35:18.9265535Z Ran 1 test in 4.258s 2022-05-18T04:35:18.9265802Z 2022-05-18T04:35:18.9265910Z OK 2022-05-18T04:35:18.9266046Z 2022-05-18T04:35:18.9266181Z Generating XML reports... 2022-05-18T04:35:18.9309165Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043514.xml 2022-05-18T04:35:20.1170852Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:20.1185055Z 2022-05-18T04:35:20.1185469Z Running tests... 2022-05-18T04:35:20.1185978Z ---------------------------------------------------------------------- 2022-05-18T04:35:21.7020613Z test_builtin_ddp_comm_hooks_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:21.7411899Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50800 2022-05-18T04:35:21.7519822Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50801 2022-05-18T04:35:22.6920206Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:22.7100843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:23.9509889Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3s22au8f 2022-05-18T04:35:23.9510792Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3s22au8f/_remote_module_non_scriptable.py 2022-05-18T04:35:23.9989748Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb6lweq67 2022-05-18T04:35:23.9992119Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb6lweq67/_remote_module_non_scriptable.py 2022-05-18T04:35:24.3596474Z ok (4.241s) 2022-05-18T04:35:24.3598304Z 2022-05-18T04:35:24.3598812Z ---------------------------------------------------------------------- 2022-05-18T04:35:24.3599163Z Ran 1 test in 4.241s 2022-05-18T04:35:24.3599332Z 2022-05-18T04:35:24.3599433Z OK 2022-05-18T04:35:24.3599574Z 2022-05-18T04:35:24.3599705Z Generating XML reports... 2022-05-18T04:35:24.3640055Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043520.xml 2022-05-18T04:35:25.5572442Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:25.5586321Z 2022-05-18T04:35:25.5586554Z Running tests... 2022-05-18T04:35:25.5586994Z ---------------------------------------------------------------------- 2022-05-18T04:35:25.5595114Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T04:35:27.1460263Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:27.1844651Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50920 2022-05-18T04:35:27.1952017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50921 2022-05-18T04:35:28.1332170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:28.1461959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:29.4117833Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_zx1s52z 2022-05-18T04:35:29.4118680Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_zx1s52z/_remote_module_non_scriptable.py 2022-05-18T04:35:29.4525375Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpll60xh7a 2022-05-18T04:35:29.4527918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpll60xh7a/_remote_module_non_scriptable.py 2022-05-18T04:35:30.0034029Z ok (4.444s) 2022-05-18T04:35:30.0034402Z 2022-05-18T04:35:30.0035211Z ---------------------------------------------------------------------- 2022-05-18T04:35:30.0035658Z Ran 1 test in 4.445s 2022-05-18T04:35:30.0035831Z 2022-05-18T04:35:30.0035927Z OK 2022-05-18T04:35:30.0036080Z 2022-05-18T04:35:30.0036214Z Generating XML reports... 2022-05-18T04:35:30.0079071Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043525.xml 2022-05-18T04:35:31.1561686Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:31.1575192Z 2022-05-18T04:35:31.1575609Z Running tests... 2022-05-18T04:35:31.1576543Z ---------------------------------------------------------------------- 2022-05-18T04:35:31.1584311Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:35:32.7227940Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:32.7617184Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51040 2022-05-18T04:35:32.7725055Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51041 2022-05-18T04:35:33.6814573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:33.6835176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:34.9861858Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5_d39dxo 2022-05-18T04:35:34.9863086Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5_d39dxo/_remote_module_non_scriptable.py 2022-05-18T04:35:34.9913872Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwlse45ys 2022-05-18T04:35:34.9917204Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwlse45ys/_remote_module_non_scriptable.py 2022-05-18T04:35:35.5805651Z ok (4.423s) 2022-05-18T04:35:35.5805998Z 2022-05-18T04:35:35.5806765Z ---------------------------------------------------------------------- 2022-05-18T04:35:35.5807297Z Ran 1 test in 4.423s 2022-05-18T04:35:35.5807447Z 2022-05-18T04:35:35.5807547Z OK 2022-05-18T04:35:35.5807684Z 2022-05-18T04:35:35.5807842Z Generating XML reports... 2022-05-18T04:35:35.5850682Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043531.xml 2022-05-18T04:35:36.7506037Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:36.7520648Z 2022-05-18T04:35:36.7521133Z Running tests... 2022-05-18T04:35:36.7521624Z ---------------------------------------------------------------------- 2022-05-18T04:35:36.7531943Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:35:38.3005459Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:38.3392726Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51160 2022-05-18T04:35:38.3498728Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51161 2022-05-18T04:35:39.2679364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:39.2713302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:40.5667336Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu6asvrus 2022-05-18T04:35:40.5668376Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu6asvrus/_remote_module_non_scriptable.py 2022-05-18T04:35:40.6042050Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq12k3m3x 2022-05-18T04:35:40.6045125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq12k3m3x/_remote_module_non_scriptable.py 2022-05-18T04:35:40.8705850Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.8706398Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.8995675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.8996198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9143507Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:35:40.9144515Z warnings.warn( 2022-05-18T04:35:40.9145580Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:35:40.9146300Z warnings.warn( 2022-05-18T04:35:40.9251596Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9252094Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9457681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9458163Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9742168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9742962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9991081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:40.9991571Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:41.2580608Z ok (4.506s) 2022-05-18T04:35:41.2580847Z 2022-05-18T04:35:41.2581253Z ---------------------------------------------------------------------- 2022-05-18T04:35:41.2581599Z Ran 1 test in 4.506s 2022-05-18T04:35:41.2581767Z 2022-05-18T04:35:41.2581842Z OK 2022-05-18T04:35:41.2583116Z 2022-05-18T04:35:41.2584634Z Generating XML reports... 2022-05-18T04:35:41.2625100Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043536.xml 2022-05-18T04:35:42.4539013Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:42.4554018Z 2022-05-18T04:35:42.4554278Z Running tests... 2022-05-18T04:35:42.4554733Z ---------------------------------------------------------------------- 2022-05-18T04:35:42.4565096Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:35:44.0444407Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:44.0840931Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51280 2022-05-18T04:35:44.0948562Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51281 2022-05-18T04:35:45.0079180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:45.0098947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:46.3274930Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3cm6bec7 2022-05-18T04:35:46.3276101Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3cm6bec7/_remote_module_non_scriptable.py 2022-05-18T04:35:46.3289405Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6v6tjtrh 2022-05-18T04:35:46.3292138Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6v6tjtrh/_remote_module_non_scriptable.py 2022-05-18T04:35:46.5962098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.5963121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.6264883Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.6269661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.6425806Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:35:46.6427435Z warnings.warn( 2022-05-18T04:35:46.6429427Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:35:46.6430710Z warnings.warn( 2022-05-18T04:35:46.6530405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.6536093Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.6745563Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.6750334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.7048865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.7052466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.7305358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:46.7310533Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:47.0032040Z ok (4.547s) 2022-05-18T04:35:47.0032274Z 2022-05-18T04:35:47.0032684Z ---------------------------------------------------------------------- 2022-05-18T04:35:47.0033008Z Ran 1 test in 4.548s 2022-05-18T04:35:47.0033176Z 2022-05-18T04:35:47.0033275Z OK 2022-05-18T04:35:47.0033412Z 2022-05-18T04:35:47.0033552Z Generating XML reports... 2022-05-18T04:35:47.0080899Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043542.xml 2022-05-18T04:35:48.1946177Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:48.1960010Z 2022-05-18T04:35:48.1960259Z Running tests... 2022-05-18T04:35:48.1960708Z ---------------------------------------------------------------------- 2022-05-18T04:35:48.1968915Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:35:49.7258580Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:49.7647658Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51400 2022-05-18T04:35:49.7753148Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51401 2022-05-18T04:35:50.6882574Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:50.7113535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:51.9793972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2kfdf5lm 2022-05-18T04:35:51.9795142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2kfdf5lm/_remote_module_non_scriptable.py 2022-05-18T04:35:51.9820107Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5i8o8s07 2022-05-18T04:35:51.9822302Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5i8o8s07/_remote_module_non_scriptable.py 2022-05-18T04:35:52.2574116Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:52.2574682Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:52.2847331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:52.2849294Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:52.5832823Z ok (4.387s) 2022-05-18T04:35:52.5833273Z 2022-05-18T04:35:52.5833688Z ---------------------------------------------------------------------- 2022-05-18T04:35:52.5834036Z Ran 1 test in 4.387s 2022-05-18T04:35:52.5834204Z 2022-05-18T04:35:52.5834282Z OK 2022-05-18T04:35:52.5834419Z 2022-05-18T04:35:52.5834557Z Generating XML reports... 2022-05-18T04:35:52.5877185Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043548.xml 2022-05-18T04:35:53.7592633Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:53.7607119Z 2022-05-18T04:35:53.7607364Z Running tests... 2022-05-18T04:35:53.7607901Z ---------------------------------------------------------------------- 2022-05-18T04:35:53.7616657Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:35:55.3287675Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:55.3669360Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51520 2022-05-18T04:35:55.3775155Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51521 2022-05-18T04:35:56.2728305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:56.3164391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:57.5661191Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj2h4k1tv 2022-05-18T04:35:57.5662410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj2h4k1tv/_remote_module_non_scriptable.py 2022-05-18T04:35:57.5722205Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpznkw2y6g 2022-05-18T04:35:57.5725202Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpznkw2y6g/_remote_module_non_scriptable.py 2022-05-18T04:35:57.8479254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:57.8479777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:57.8790726Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:57.8791593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:35:58.1854652Z ok (4.424s) 2022-05-18T04:35:58.1855072Z 2022-05-18T04:35:58.1855589Z ---------------------------------------------------------------------- 2022-05-18T04:35:58.1856028Z Ran 1 test in 4.425s 2022-05-18T04:35:58.1856199Z 2022-05-18T04:35:58.1856298Z OK 2022-05-18T04:35:58.1856439Z 2022-05-18T04:35:58.1856571Z Generating XML reports... 2022-05-18T04:35:58.1899343Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043553.xml 2022-05-18T04:35:59.3682013Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:35:59.3694853Z 2022-05-18T04:35:59.3695004Z Running tests... 2022-05-18T04:35:59.3695895Z ---------------------------------------------------------------------- 2022-05-18T04:35:59.3707095Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:36:00.9204067Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:00.9589660Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51640 2022-05-18T04:36:00.9695563Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51641 2022-05-18T04:36:01.8861118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:01.9006117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:03.1828814Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzm28qwy1 2022-05-18T04:36:03.1830042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzm28qwy1/_remote_module_non_scriptable.py 2022-05-18T04:36:03.2180575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy3s0xgd7 2022-05-18T04:36:03.2182770Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy3s0xgd7/_remote_module_non_scriptable.py 2022-05-18T04:36:03.4775920Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:03.4776936Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:03.5009849Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:36:03.5011489Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:36:03.5343792Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:03.5345058Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:03.8778644Z ok (4.508s) 2022-05-18T04:36:03.8778878Z 2022-05-18T04:36:03.8779283Z ---------------------------------------------------------------------- 2022-05-18T04:36:03.8779625Z Ran 1 test in 4.508s 2022-05-18T04:36:03.8779775Z 2022-05-18T04:36:03.8779870Z OK 2022-05-18T04:36:03.8780005Z 2022-05-18T04:36:03.8780138Z Generating XML reports... 2022-05-18T04:36:03.8822745Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043559.xml 2022-05-18T04:36:05.0705933Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:05.0719947Z 2022-05-18T04:36:05.0720193Z Running tests... 2022-05-18T04:36:05.0720631Z ---------------------------------------------------------------------- 2022-05-18T04:36:05.0731806Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:36:06.6651804Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:06.7041695Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51760 2022-05-18T04:36:06.7148642Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51761 2022-05-18T04:36:07.6156496Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:07.6185508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:08.9019367Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps5lzz3lx 2022-05-18T04:36:08.9020527Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps5lzz3lx/_remote_module_non_scriptable.py 2022-05-18T04:36:08.9209736Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkgknloqm 2022-05-18T04:36:08.9212971Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkgknloqm/_remote_module_non_scriptable.py 2022-05-18T04:36:09.5229266Z ok (4.451s) 2022-05-18T04:36:09.5229501Z 2022-05-18T04:36:09.5229913Z ---------------------------------------------------------------------- 2022-05-18T04:36:09.5230245Z Ran 1 test in 4.451s 2022-05-18T04:36:09.5230414Z 2022-05-18T04:36:09.5230510Z OK 2022-05-18T04:36:09.5231336Z 2022-05-18T04:36:09.5231498Z Generating XML reports... 2022-05-18T04:36:09.5273288Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043605.xml 2022-05-18T04:36:10.7031853Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:10.7045512Z 2022-05-18T04:36:10.7045660Z Running tests... 2022-05-18T04:36:10.7046089Z ---------------------------------------------------------------------- 2022-05-18T04:36:10.7053640Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:36:12.2627776Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:12.3014784Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51880 2022-05-18T04:36:12.3121902Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51881 2022-05-18T04:36:13.2178715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:13.2534375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:14.5011730Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2f4f3um9 2022-05-18T04:36:14.5012882Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2f4f3um9/_remote_module_non_scriptable.py 2022-05-18T04:36:14.5059109Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb90v9s8d 2022-05-18T04:36:14.5061772Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb90v9s8d/_remote_module_non_scriptable.py 2022-05-18T04:36:14.7833387Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:14.7837744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:14.8132065Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:14.8138030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:15.1201794Z ok (4.415s) 2022-05-18T04:36:15.1202265Z 2022-05-18T04:36:15.1202784Z ---------------------------------------------------------------------- 2022-05-18T04:36:15.1203151Z Ran 1 test in 4.416s 2022-05-18T04:36:15.1203326Z 2022-05-18T04:36:15.1203424Z OK 2022-05-18T04:36:15.1203568Z 2022-05-18T04:36:15.1203701Z Generating XML reports... 2022-05-18T04:36:15.1246303Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043610.xml 2022-05-18T04:36:16.3091943Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:16.3106422Z 2022-05-18T04:36:16.3106696Z Running tests... 2022-05-18T04:36:16.3107133Z ---------------------------------------------------------------------- 2022-05-18T04:36:16.3118611Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:36:17.8905587Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:17.9298570Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52000 2022-05-18T04:36:17.9406666Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52001 2022-05-18T04:36:18.8185778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:18.8338715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:20.0914934Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkhj1ngmm 2022-05-18T04:36:20.0915888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkhj1ngmm/_remote_module_non_scriptable.py 2022-05-18T04:36:20.1168752Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdxx309ud 2022-05-18T04:36:20.1171631Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdxx309ud/_remote_module_non_scriptable.py 2022-05-18T04:36:20.3612737Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:36:20.3662955Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:36:20.3999929Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:36:20.4000693Z warnings.warn( 2022-05-18T04:36:20.4001769Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:36:20.4002690Z warnings.warn( 2022-05-18T04:36:20.4101314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:20.4101829Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:20.4595305Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:20.4595779Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:20.7498145Z ok (4.439s) 2022-05-18T04:36:20.7500166Z 2022-05-18T04:36:20.7500721Z ---------------------------------------------------------------------- 2022-05-18T04:36:20.7501225Z Ran 1 test in 4.439s 2022-05-18T04:36:20.7501394Z 2022-05-18T04:36:20.7501493Z OK 2022-05-18T04:36:20.7501612Z 2022-05-18T04:36:20.7501745Z Generating XML reports... 2022-05-18T04:36:20.7543652Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043616.xml 2022-05-18T04:36:21.9386616Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:21.9400409Z 2022-05-18T04:36:21.9400552Z Running tests... 2022-05-18T04:36:21.9401256Z ---------------------------------------------------------------------- 2022-05-18T04:36:21.9412216Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:36:23.4840175Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:23.5227144Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52120 2022-05-18T04:36:23.5333378Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52121 2022-05-18T04:36:24.4415695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:24.4427210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:25.7300429Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp304yjkc8 2022-05-18T04:36:25.7301013Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp304yjkc8/_remote_module_non_scriptable.py 2022-05-18T04:36:25.7305881Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf6quz55p 2022-05-18T04:36:25.7308962Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf6quz55p/_remote_module_non_scriptable.py 2022-05-18T04:36:25.9907126Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:36:25.9908000Z warnings.warn( 2022-05-18T04:36:25.9909141Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:36:25.9910344Z warnings.warn( 2022-05-18T04:36:26.0200669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:26.0202678Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:26.0582307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:26.0582834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:26.3414545Z ok (4.401s) 2022-05-18T04:36:26.3414778Z 2022-05-18T04:36:26.3415191Z ---------------------------------------------------------------------- 2022-05-18T04:36:26.3415838Z Ran 1 test in 4.401s 2022-05-18T04:36:26.3416008Z 2022-05-18T04:36:26.3416100Z OK 2022-05-18T04:36:26.3416236Z 2022-05-18T04:36:26.3416370Z Generating XML reports... 2022-05-18T04:36:26.3458434Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043621.xml 2022-05-18T04:36:27.5373988Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:27.5388647Z 2022-05-18T04:36:27.5388891Z Running tests... 2022-05-18T04:36:27.5389328Z ---------------------------------------------------------------------- 2022-05-18T04:36:27.5402889Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:36:29.1341546Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:29.1736071Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52240 2022-05-18T04:36:29.1843765Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52241 2022-05-18T04:36:30.0925369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:30.1391116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:31.3795277Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw910gjf6 2022-05-18T04:36:31.3795917Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw910gjf6/_remote_module_non_scriptable.py 2022-05-18T04:36:31.4074867Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz9wxfnw7 2022-05-18T04:36:31.4077488Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz9wxfnw7/_remote_module_non_scriptable.py 2022-05-18T04:36:31.6639138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:31.6639823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:31.6961017Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:31.6962009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:31.9924650Z ok (4.453s) 2022-05-18T04:36:31.9925178Z 2022-05-18T04:36:31.9925995Z ---------------------------------------------------------------------- 2022-05-18T04:36:31.9926624Z Ran 1 test in 4.454s 2022-05-18T04:36:31.9926971Z 2022-05-18T04:36:31.9927144Z OK 2022-05-18T04:36:31.9927772Z 2022-05-18T04:36:31.9928057Z Generating XML reports... 2022-05-18T04:36:31.9972522Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043627.xml 2022-05-18T04:36:33.2193621Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:33.2207585Z 2022-05-18T04:36:33.2208006Z Running tests... 2022-05-18T04:36:33.2208489Z ---------------------------------------------------------------------- 2022-05-18T04:36:33.2221221Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:36:34.8148910Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:34.8542355Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52360 2022-05-18T04:36:34.8650740Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52361 2022-05-18T04:36:35.7469082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:35.7964341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:37.0246050Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp0t5yx25 2022-05-18T04:36:37.0246920Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp0t5yx25/_remote_module_non_scriptable.py 2022-05-18T04:36:37.0436692Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptvz22ayl 2022-05-18T04:36:37.0439521Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptvz22ayl/_remote_module_non_scriptable.py 2022-05-18T04:36:37.3108569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3109135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3386038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3386516Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3577895Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3578378Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3851039Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.3851509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:36:37.6731500Z ok (4.452s) 2022-05-18T04:36:37.6731688Z 2022-05-18T04:36:37.6732081Z ---------------------------------------------------------------------- 2022-05-18T04:36:37.6732402Z Ran 1 test in 4.452s 2022-05-18T04:36:37.6732567Z 2022-05-18T04:36:37.6733909Z OK 2022-05-18T04:36:37.6734088Z 2022-05-18T04:36:37.6734420Z Generating XML reports... 2022-05-18T04:36:37.6777020Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043633.xml 2022-05-18T04:36:38.8810483Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:38.8824392Z 2022-05-18T04:36:38.8824808Z Running tests... 2022-05-18T04:36:38.8825282Z ---------------------------------------------------------------------- 2022-05-18T04:36:40.4642967Z test_ddp_comm_hook_allreduce_hook_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:40.5035968Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52480 2022-05-18T04:36:40.5141792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52481 2022-05-18T04:36:41.4434243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:41.4643539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:42.7284748Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_7w_ef98 2022-05-18T04:36:42.7285399Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_7w_ef98/_remote_module_non_scriptable.py 2022-05-18T04:36:42.7578061Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpye00pjgv 2022-05-18T04:36:42.7580804Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpye00pjgv/_remote_module_non_scriptable.py 2022-05-18T04:36:43.1217817Z ok (4.239s) 2022-05-18T04:36:43.1218000Z 2022-05-18T04:36:43.1218392Z ---------------------------------------------------------------------- 2022-05-18T04:36:43.1218733Z Ran 1 test in 4.239s 2022-05-18T04:36:43.1218908Z 2022-05-18T04:36:43.1219004Z OK 2022-05-18T04:36:43.1219143Z 2022-05-18T04:36:43.1219271Z Generating XML reports... 2022-05-18T04:36:43.1261285Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043638.xml 2022-05-18T04:36:44.3070651Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:44.3084406Z 2022-05-18T04:36:44.3084716Z Running tests... 2022-05-18T04:36:44.3085153Z ---------------------------------------------------------------------- 2022-05-18T04:36:45.8772651Z test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:45.9158486Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52600 2022-05-18T04:36:45.9265786Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52601 2022-05-18T04:36:46.8266326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:46.8347955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:48.1137972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxhek3n1u 2022-05-18T04:36:48.1138843Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxhek3n1u/_remote_module_non_scriptable.py 2022-05-18T04:36:48.1387181Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph145lw70 2022-05-18T04:36:48.1390043Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph145lw70/_remote_module_non_scriptable.py 2022-05-18T04:36:48.5341146Z ok (4.225s) 2022-05-18T04:36:48.5341366Z 2022-05-18T04:36:48.5341766Z ---------------------------------------------------------------------- 2022-05-18T04:36:48.5342105Z Ran 1 test in 4.226s 2022-05-18T04:36:48.5342254Z 2022-05-18T04:36:48.5342349Z OK 2022-05-18T04:36:48.5343483Z 2022-05-18T04:36:48.5344027Z Generating XML reports... 2022-05-18T04:36:48.5384841Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043644.xml 2022-05-18T04:36:49.7203511Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:49.7217181Z 2022-05-18T04:36:49.7217411Z Running tests... 2022-05-18T04:36:49.7218084Z ---------------------------------------------------------------------- 2022-05-18T04:36:51.3003918Z test_ddp_comm_hook_allreduce_hook_nccl_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:51.3396203Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52720 2022-05-18T04:36:51.3503120Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52721 2022-05-18T04:36:52.2464074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:52.2791604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:53.5210866Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm515_1do 2022-05-18T04:36:53.5212494Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm515_1do/_remote_module_non_scriptable.py 2022-05-18T04:36:53.5281909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqky00m8e 2022-05-18T04:36:53.5284695Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqky00m8e/_remote_module_non_scriptable.py 2022-05-18T04:36:53.8577120Z ok (4.136s) 2022-05-18T04:36:53.8577339Z 2022-05-18T04:36:53.8577741Z ---------------------------------------------------------------------- 2022-05-18T04:36:53.8578289Z Ran 1 test in 4.136s 2022-05-18T04:36:53.8578459Z 2022-05-18T04:36:53.8578562Z OK 2022-05-18T04:36:53.8578701Z 2022-05-18T04:36:53.8578834Z Generating XML reports... 2022-05-18T04:36:53.8620505Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043649.xml 2022-05-18T04:36:55.0316864Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:36:55.0330132Z 2022-05-18T04:36:55.0330388Z Running tests... 2022-05-18T04:36:55.0331013Z ---------------------------------------------------------------------- 2022-05-18T04:36:55.0343984Z test_ddp_comm_hook_allreduce_with_then_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:36:56.5761240Z This unit test verifies whether a DDP communication hook that calls allreduce and then ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:56.6146520Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52840 2022-05-18T04:36:56.6252098Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52841 2022-05-18T04:36:57.5389025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:57.5692621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:58.8186939Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptxq09x84 2022-05-18T04:36:58.8187944Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptxq09x84/_remote_module_non_scriptable.py 2022-05-18T04:36:58.8265725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3tyukt9f 2022-05-18T04:36:58.8268819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3tyukt9f/_remote_module_non_scriptable.py 2022-05-18T04:36:59.2326949Z ok (4.199s) 2022-05-18T04:36:59.2327137Z 2022-05-18T04:36:59.2327553Z ---------------------------------------------------------------------- 2022-05-18T04:36:59.2327909Z Ran 1 test in 4.200s 2022-05-18T04:36:59.2328075Z 2022-05-18T04:36:59.2328170Z OK 2022-05-18T04:36:59.2328304Z 2022-05-18T04:36:59.2328436Z Generating XML reports... 2022-05-18T04:36:59.2372032Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043655.xml 2022-05-18T04:37:00.4061739Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:00.4075229Z 2022-05-18T04:37:00.4075674Z Running tests... 2022-05-18T04:37:00.4076190Z ---------------------------------------------------------------------- 2022-05-18T04:37:00.4084216Z test_ddp_comm_hook_future_passing_gpu_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:37:01.9653530Z This unit test verifies whether the Future object is passed properly using nccl backend. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:02.0049152Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52960 2022-05-18T04:37:02.0157829Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52961 2022-05-18T04:37:02.9016552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:02.9097114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:04.1802321Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgt1nz0tn 2022-05-18T04:37:04.1803621Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgt1nz0tn/_remote_module_non_scriptable.py 2022-05-18T04:37:04.2141032Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpux_xqaa3 2022-05-18T04:37:04.2144390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpux_xqaa3/_remote_module_non_scriptable.py 2022-05-18T04:37:04.6234258Z ok (4.216s) 2022-05-18T04:37:04.6234450Z 2022-05-18T04:37:04.6234861Z ---------------------------------------------------------------------- 2022-05-18T04:37:04.6235213Z Ran 1 test in 4.216s 2022-05-18T04:37:04.6235380Z 2022-05-18T04:37:04.6235476Z OK 2022-05-18T04:37:04.6235611Z 2022-05-18T04:37:04.6235724Z Generating XML reports... 2022-05-18T04:37:04.6279043Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043700.xml 2022-05-18T04:37:05.8154312Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:05.8167737Z 2022-05-18T04:37:05.8168142Z Running tests... 2022-05-18T04:37:05.8168675Z ---------------------------------------------------------------------- 2022-05-18T04:37:07.3596277Z test_ddp_multi_device_module_config (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:07.3989149Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53080 2022-05-18T04:37:07.4101602Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53081 2022-05-18T04:37:08.3219615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:08.3372293Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:08.5142910Z skip: Need at least 4 CUDA devices (2.697s) 2022-05-18T04:37:08.5143167Z 2022-05-18T04:37:08.5143555Z ---------------------------------------------------------------------- 2022-05-18T04:37:08.5144084Z Ran 1 test in 2.697s 2022-05-18T04:37:08.5144252Z 2022-05-18T04:37:08.5144366Z OK (skipped=1) 2022-05-18T04:37:08.5144546Z 2022-05-18T04:37:08.5144679Z Generating XML reports... 2022-05-18T04:37:08.5187460Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043705.xml 2022-05-18T04:37:09.6962916Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:09.6977493Z 2022-05-18T04:37:09.6977930Z Running tests... 2022-05-18T04:37:09.6978413Z ---------------------------------------------------------------------- 2022-05-18T04:37:11.2919485Z test_ddp_weight_sharing (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:11.3313937Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53183 2022-05-18T04:37:11.3421861Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53184 2022-05-18T04:37:12.3072200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:12.3086169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:13.6167119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsr3k4c4c 2022-05-18T04:37:13.6168176Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsr3k4c4c/_remote_module_non_scriptable.py 2022-05-18T04:37:13.6433968Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc78olyf0 2022-05-18T04:37:13.6436396Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc78olyf0/_remote_module_non_scriptable.py 2022-05-18T04:37:13.7385119Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.7385654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.7957058Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.7958201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.8518020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.8518630Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.9074554Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:13.9075055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:14.2503611Z ok (4.552s) 2022-05-18T04:37:14.2503803Z 2022-05-18T04:37:14.2504198Z ---------------------------------------------------------------------- 2022-05-18T04:37:14.2504735Z Ran 1 test in 4.553s 2022-05-18T04:37:14.2504918Z 2022-05-18T04:37:14.2505024Z OK 2022-05-18T04:37:14.2505160Z 2022-05-18T04:37:14.2505295Z Generating XML reports... 2022-05-18T04:37:14.2548455Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043709.xml 2022-05-18T04:37:15.4053275Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:15.4067860Z 2022-05-18T04:37:15.4068226Z Running tests... 2022-05-18T04:37:15.4068690Z ---------------------------------------------------------------------- 2022-05-18T04:37:16.9764287Z test_ddp_with_lazy_parameters (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:17.0159312Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53303 2022-05-18T04:37:17.0268132Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53304 2022-05-18T04:37:17.9491631Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:17.9498234Z /opt/conda/lib/python3.9/site-packages/torch/nn/modules/lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-05-18T04:37:17.9498938Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-05-18T04:37:17.9542301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:17.9551036Z /opt/conda/lib/python3.9/site-packages/torch/nn/modules/lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-05-18T04:37:17.9551698Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-05-18T04:37:17.9592743Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjev4wio_ 2022-05-18T04:37:17.9595241Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjev4wio_/_remote_module_non_scriptable.py 2022-05-18T04:37:17.9649403Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwlhe7lvj 2022-05-18T04:37:17.9652523Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwlhe7lvj/_remote_module_non_scriptable.py 2022-05-18T04:37:18.1309660Z ok (2.724s) 2022-05-18T04:37:18.1309845Z 2022-05-18T04:37:18.1310227Z ---------------------------------------------------------------------- 2022-05-18T04:37:18.1310567Z Ran 1 test in 2.724s 2022-05-18T04:37:18.1310739Z 2022-05-18T04:37:18.1310839Z OK 2022-05-18T04:37:18.1310975Z 2022-05-18T04:37:18.1311115Z Generating XML reports... 2022-05-18T04:37:18.1354777Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043715.xml 2022-05-18T04:37:19.3010062Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:19.3024472Z 2022-05-18T04:37:19.3024880Z Running tests... 2022-05-18T04:37:19.3025377Z ---------------------------------------------------------------------- 2022-05-18T04:37:20.8822253Z test_default_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:20.9216722Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53410 2022-05-18T04:37:20.9323833Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53411 2022-05-18T04:37:21.8436142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:21.8762554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:23.1178195Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt4vlijkd 2022-05-18T04:37:23.1178802Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt4vlijkd/_remote_module_non_scriptable.py 2022-05-18T04:37:23.1223278Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppbjzyyol 2022-05-18T04:37:23.1226843Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppbjzyyol/_remote_module_non_scriptable.py 2022-05-18T04:37:23.5399357Z ok (4.237s) 2022-05-18T04:37:23.5399577Z 2022-05-18T04:37:23.5399970Z ---------------------------------------------------------------------- 2022-05-18T04:37:23.5400313Z Ran 1 test in 4.238s 2022-05-18T04:37:23.5400481Z 2022-05-18T04:37:23.5400578Z OK 2022-05-18T04:37:23.5400712Z 2022-05-18T04:37:23.5400847Z Generating XML reports... 2022-05-18T04:37:23.5443003Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043719.xml 2022-05-18T04:37:24.7123534Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:24.7137275Z 2022-05-18T04:37:24.7137532Z Running tests... 2022-05-18T04:37:24.7137968Z ---------------------------------------------------------------------- 2022-05-18T04:37:26.2768316Z test_default_ddp_comm_hooks_nccl_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:26.3157640Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53530 2022-05-18T04:37:26.3265722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53531 2022-05-18T04:37:27.2421708Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:27.2446362Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:28.5151479Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn4uyvers 2022-05-18T04:37:28.5152357Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn4uyvers/_remote_module_non_scriptable.py 2022-05-18T04:37:28.5251190Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgq231tf3 2022-05-18T04:37:28.5254575Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgq231tf3/_remote_module_non_scriptable.py 2022-05-18T04:37:28.9340288Z ok (4.220s) 2022-05-18T04:37:28.9340742Z 2022-05-18T04:37:28.9341493Z ---------------------------------------------------------------------- 2022-05-18T04:37:28.9341882Z Ran 1 test in 4.220s 2022-05-18T04:37:28.9342031Z 2022-05-18T04:37:28.9342129Z OK 2022-05-18T04:37:28.9342263Z 2022-05-18T04:37:28.9342396Z Generating XML reports... 2022-05-18T04:37:28.9385166Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043724.xml 2022-05-18T04:37:30.1330008Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:30.1344604Z 2022-05-18T04:37:30.1344873Z Running tests... 2022-05-18T04:37:30.1345309Z ---------------------------------------------------------------------- 2022-05-18T04:37:31.7269891Z test_failure_recovery (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:31.7664143Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53650 2022-05-18T04:37:31.7771735Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53651 2022-05-18T04:37:32.6864867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:32.6904528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:33.9532136Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmiyllewy 2022-05-18T04:37:33.9532756Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmiyllewy/_remote_module_non_scriptable.py 2022-05-18T04:37:33.9968187Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw2vqf0t6 2022-05-18T04:37:33.9970591Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw2vqf0t6/_remote_module_non_scriptable.py 2022-05-18T04:37:34.2536709Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:34.2537246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:34.3045034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:34.3045545Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:37:34.5851957Z ok (4.450s) 2022-05-18T04:37:34.5852180Z 2022-05-18T04:37:34.5852591Z ---------------------------------------------------------------------- 2022-05-18T04:37:34.5852914Z Ran 1 test in 4.451s 2022-05-18T04:37:34.5853421Z 2022-05-18T04:37:34.5853519Z OK 2022-05-18T04:37:34.5853652Z 2022-05-18T04:37:34.5853784Z Generating XML reports... 2022-05-18T04:37:34.5896637Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043730.xml 2022-05-18T04:37:35.7810493Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:35.7825062Z 2022-05-18T04:37:35.7825516Z Running tests... 2022-05-18T04:37:35.7826003Z ---------------------------------------------------------------------- 2022-05-18T04:37:37.3626596Z test_find_unused_parameters_kwarg_debug_detail (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:37.4012310Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53781 2022-05-18T04:37:37.4118839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53782 2022-05-18T04:37:38.3242294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:38.3440906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:38.3560620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:37:38.3561151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:37:38.3561959Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:38.3562656Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:39.6152631Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa2171e4k 2022-05-18T04:37:39.6153890Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa2171e4k/_remote_module_non_scriptable.py 2022-05-18T04:37:39.6577676Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm84u5muc 2022-05-18T04:37:39.6580123Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm84u5muc/_remote_module_non_scriptable.py 2022-05-18T04:37:40.2197904Z ok (4.437s) 2022-05-18T04:37:40.2198104Z 2022-05-18T04:37:40.2198508Z ---------------------------------------------------------------------- 2022-05-18T04:37:40.2198851Z Ran 1 test in 4.437s 2022-05-18T04:37:40.2199022Z 2022-05-18T04:37:40.2199125Z OK 2022-05-18T04:37:40.2199247Z 2022-05-18T04:37:40.2199377Z Generating XML reports... 2022-05-18T04:37:40.2242342Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043735.xml 2022-05-18T04:37:41.4120248Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:41.4133998Z 2022-05-18T04:37:41.4134148Z Running tests... 2022-05-18T04:37:41.4134640Z ---------------------------------------------------------------------- 2022-05-18T04:37:43.0040562Z test_find_unused_parameters_kwarg_debug_info (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:43.0434935Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53907 2022-05-18T04:37:43.0541217Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53908 2022-05-18T04:37:43.9482965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:43.9492707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:37:43.9636588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:43.9646639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:37:43.9647452Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:43.9697886Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:45.2449206Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpllwvelkt 2022-05-18T04:37:45.2449860Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpllwvelkt/_remote_module_non_scriptable.py 2022-05-18T04:37:45.2516667Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq4unq9t7 2022-05-18T04:37:45.2519479Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq4unq9t7/_remote_module_non_scriptable.py 2022-05-18T04:37:45.8621933Z ok (4.448s) 2022-05-18T04:37:45.8622156Z 2022-05-18T04:37:45.8622555Z ---------------------------------------------------------------------- 2022-05-18T04:37:45.8622897Z Ran 1 test in 4.449s 2022-05-18T04:37:45.8623064Z 2022-05-18T04:37:45.8623141Z OK 2022-05-18T04:37:45.8623284Z 2022-05-18T04:37:45.8623421Z Generating XML reports... 2022-05-18T04:37:45.8665630Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043741.xml 2022-05-18T04:37:47.0604812Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:47.0619059Z 2022-05-18T04:37:47.0619457Z Running tests... 2022-05-18T04:37:47.0619981Z ---------------------------------------------------------------------- 2022-05-18T04:37:48.6362792Z test_find_unused_parameters_kwarg_debug_off (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:48.6757805Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54027 2022-05-18T04:37:48.6865919Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54028 2022-05-18T04:37:49.5941370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:49.5950880Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:37:49.5974100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:49.5985039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:37:49.5986044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:49.6053892Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:50.8675085Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpecxnc327 2022-05-18T04:37:50.8675763Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpecxnc327/_remote_module_non_scriptable.py 2022-05-18T04:37:50.8911084Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa4s3j87e 2022-05-18T04:37:50.8913893Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa4s3j87e/_remote_module_non_scriptable.py 2022-05-18T04:37:51.4947114Z ok (4.432s) 2022-05-18T04:37:51.4947457Z 2022-05-18T04:37:51.4948327Z ---------------------------------------------------------------------- 2022-05-18T04:37:51.4949072Z Ran 1 test in 4.433s 2022-05-18T04:37:51.4949263Z 2022-05-18T04:37:51.4949359Z OK 2022-05-18T04:37:51.4949496Z 2022-05-18T04:37:51.4949614Z Generating XML reports... 2022-05-18T04:37:51.4991743Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043747.xml 2022-05-18T04:37:52.6913718Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:52.6927017Z 2022-05-18T04:37:52.6927264Z Running tests... 2022-05-18T04:37:52.6928223Z ---------------------------------------------------------------------- 2022-05-18T04:37:54.2770028Z test_find_unused_parameters_kwarg_grad_is_view_debug_detail (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:54.3167616Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54147 2022-05-18T04:37:54.3276154Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54148 2022-05-18T04:37:55.3510101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:55.3646730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:55.3830732Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:37:55.3831923Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:37:55.3832829Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:55.3833530Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:37:56.6835088Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi1v1oq3q 2022-05-18T04:37:56.6835984Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi1v1oq3q/_remote_module_non_scriptable.py 2022-05-18T04:37:56.6997755Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppnbxr1u1 2022-05-18T04:37:56.7000811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppnbxr1u1/_remote_module_non_scriptable.py 2022-05-18T04:37:57.2360159Z ok (4.543s) 2022-05-18T04:37:57.2360587Z 2022-05-18T04:37:57.2361349Z ---------------------------------------------------------------------- 2022-05-18T04:37:57.2361688Z Ran 1 test in 4.543s 2022-05-18T04:37:57.2361853Z 2022-05-18T04:37:57.2361948Z OK 2022-05-18T04:37:57.2362085Z 2022-05-18T04:37:57.2362221Z Generating XML reports... 2022-05-18T04:37:57.2404387Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043752.xml 2022-05-18T04:37:58.4225127Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:37:58.4239093Z 2022-05-18T04:37:58.4239319Z Running tests... 2022-05-18T04:37:58.4240239Z ---------------------------------------------------------------------- 2022-05-18T04:38:00.0110648Z test_find_unused_parameters_kwarg_grad_is_view_debug_info (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:00.0498152Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54273 2022-05-18T04:38:00.0605881Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54274 2022-05-18T04:38:00.9570963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:00.9581693Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:00.9654652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:00.9667142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:00.9667959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:00.9684766Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:02.2718146Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzksxmtgk 2022-05-18T04:38:02.2718798Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzksxmtgk/_remote_module_non_scriptable.py 2022-05-18T04:38:02.2997817Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjums7rrx 2022-05-18T04:38:02.3000209Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjums7rrx/_remote_module_non_scriptable.py 2022-05-18T04:38:02.8685056Z ok (4.444s) 2022-05-18T04:38:02.8685290Z 2022-05-18T04:38:02.8685699Z ---------------------------------------------------------------------- 2022-05-18T04:38:02.8686030Z Ran 1 test in 4.445s 2022-05-18T04:38:02.8686198Z 2022-05-18T04:38:02.8686297Z OK 2022-05-18T04:38:02.8686433Z 2022-05-18T04:38:02.8686573Z Generating XML reports... 2022-05-18T04:38:02.8728553Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043758.xml 2022-05-18T04:38:04.0511096Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:04.0524603Z 2022-05-18T04:38:04.0524817Z Running tests... 2022-05-18T04:38:04.0525247Z ---------------------------------------------------------------------- 2022-05-18T04:38:05.6089899Z test_find_unused_parameters_kwarg_grad_is_view_debug_off (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:05.6482316Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54393 2022-05-18T04:38:05.6589396Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54394 2022-05-18T04:38:06.6059530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:06.6070194Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:06.6123241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:06.6135443Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:06.6136268Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:06.6172949Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:07.9144665Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpre2kqs61 2022-05-18T04:38:07.9145316Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpre2kqs61/_remote_module_non_scriptable.py 2022-05-18T04:38:07.9176706Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo8yq_75c 2022-05-18T04:38:07.9179507Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo8yq_75c/_remote_module_non_scriptable.py 2022-05-18T04:38:08.4671053Z ok (4.414s) 2022-05-18T04:38:08.4671368Z 2022-05-18T04:38:08.4671757Z ---------------------------------------------------------------------- 2022-05-18T04:38:08.4672377Z Ran 1 test in 4.415s 2022-05-18T04:38:08.4672564Z 2022-05-18T04:38:08.4672662Z OK 2022-05-18T04:38:08.4672801Z 2022-05-18T04:38:08.4672938Z Generating XML reports... 2022-05-18T04:38:08.4715044Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043804.xml 2022-05-18T04:38:09.6599666Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:09.6613823Z 2022-05-18T04:38:09.6614169Z Running tests... 2022-05-18T04:38:09.6614617Z ---------------------------------------------------------------------- 2022-05-18T04:38:11.2478066Z test_fp16 (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:11.2874195Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54513 2022-05-18T04:38:11.2982461Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54514 2022-05-18T04:38:12.2049014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:12.2062470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:13.5050177Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf6iip1da 2022-05-18T04:38:13.5051018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf6iip1da/_remote_module_non_scriptable.py 2022-05-18T04:38:13.5146731Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7ag051t4 2022-05-18T04:38:13.5149716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7ag051t4/_remote_module_non_scriptable.py 2022-05-18T04:38:14.1061593Z ok (4.444s) 2022-05-18T04:38:14.1061827Z 2022-05-18T04:38:14.1062248Z ---------------------------------------------------------------------- 2022-05-18T04:38:14.1062583Z Ran 1 test in 4.445s 2022-05-18T04:38:14.1062757Z 2022-05-18T04:38:14.1062852Z OK 2022-05-18T04:38:14.1064016Z 2022-05-18T04:38:14.1065657Z Generating XML reports... 2022-05-18T04:38:14.1105565Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043809.xml 2022-05-18T04:38:15.2991773Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:15.3005886Z 2022-05-18T04:38:15.3006117Z Running tests... 2022-05-18T04:38:15.3006587Z ---------------------------------------------------------------------- 2022-05-18T04:38:16.8733888Z test_fp16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:16.9128876Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54633 2022-05-18T04:38:16.9236564Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54634 2022-05-18T04:38:17.8338739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:17.8341201Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:17.8355752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:17.8358860Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:19.1233812Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwi1sgfs7 2022-05-18T04:38:19.1234662Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwi1sgfs7/_remote_module_non_scriptable.py 2022-05-18T04:38:19.1519351Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkp6g511v 2022-05-18T04:38:19.1521956Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkp6g511v/_remote_module_non_scriptable.py 2022-05-18T04:38:19.5313207Z ok (4.230s) 2022-05-18T04:38:19.5313426Z 2022-05-18T04:38:19.5313868Z ---------------------------------------------------------------------- 2022-05-18T04:38:19.5314198Z Ran 1 test in 4.231s 2022-05-18T04:38:19.5314368Z 2022-05-18T04:38:19.5314472Z OK 2022-05-18T04:38:19.5314607Z 2022-05-18T04:38:19.5314743Z Generating XML reports... 2022-05-18T04:38:19.5356227Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043815.xml 2022-05-18T04:38:20.7125150Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:20.7138741Z 2022-05-18T04:38:20.7139214Z Running tests... 2022-05-18T04:38:20.7139972Z ---------------------------------------------------------------------- 2022-05-18T04:38:22.2676124Z test_fp16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:22.3065130Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54753 2022-05-18T04:38:22.3170556Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54754 2022-05-18T04:38:23.2118520Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:23.2119929Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:23.2185510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:23.2188855Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:24.4591946Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl32ua0z2 2022-05-18T04:38:24.4592983Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl32ua0z2/_remote_module_non_scriptable.py 2022-05-18T04:38:24.5115408Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4kmtyat0 2022-05-18T04:38:24.5118313Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4kmtyat0/_remote_module_non_scriptable.py 2022-05-18T04:38:24.9246055Z ok (4.210s) 2022-05-18T04:38:24.9246276Z 2022-05-18T04:38:24.9246704Z ---------------------------------------------------------------------- 2022-05-18T04:38:24.9247050Z Ran 1 test in 4.211s 2022-05-18T04:38:24.9247222Z 2022-05-18T04:38:24.9247306Z OK 2022-05-18T04:38:24.9247443Z 2022-05-18T04:38:24.9247577Z Generating XML reports... 2022-05-18T04:38:24.9290171Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043820.xml 2022-05-18T04:38:26.1230808Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:26.1245310Z 2022-05-18T04:38:26.1245503Z Running tests... 2022-05-18T04:38:26.1245941Z ---------------------------------------------------------------------- 2022-05-18T04:38:27.6999506Z test_fp16_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:27.7392743Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54873 2022-05-18T04:38:27.7501083Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54874 2022-05-18T04:38:28.6595420Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:28.6910846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:29.9075324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgski7rt_ 2022-05-18T04:38:29.9076221Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgski7rt_/_remote_module_non_scriptable.py 2022-05-18T04:38:29.9606095Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbtbp6dfi 2022-05-18T04:38:29.9608187Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbtbp6dfi/_remote_module_non_scriptable.py 2022-05-18T04:38:30.4578851Z ok (4.333s) 2022-05-18T04:38:30.4579087Z 2022-05-18T04:38:30.4579510Z ---------------------------------------------------------------------- 2022-05-18T04:38:30.4579833Z Ran 1 test in 4.333s 2022-05-18T04:38:30.4579999Z 2022-05-18T04:38:30.4580116Z OK 2022-05-18T04:38:30.4580258Z 2022-05-18T04:38:30.4580391Z Generating XML reports... 2022-05-18T04:38:30.4623473Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043826.xml 2022-05-18T04:38:31.6603048Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:31.6617064Z 2022-05-18T04:38:31.6617498Z Running tests... 2022-05-18T04:38:31.6617967Z ---------------------------------------------------------------------- 2022-05-18T04:38:33.2058266Z test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:33.2449170Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54993 2022-05-18T04:38:33.2555860Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54994 2022-05-18T04:38:34.1736354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:34.2008262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:35.4767388Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2i4vkob7 2022-05-18T04:38:35.4768390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2i4vkob7/_remote_module_non_scriptable.py 2022-05-18T04:38:35.4914028Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyvv0v346 2022-05-18T04:38:35.4916924Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyvv0v346/_remote_module_non_scriptable.py 2022-05-18T04:38:36.1565191Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.1828985Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.1829480Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.1829994Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2084526Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2085034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2353719Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2354206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2607240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2607734Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2866858Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.2867611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3126871Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3127368Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3395610Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3396088Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3665156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3665643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3952941Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.3953405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.4225471Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.4225957Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.4504069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.4504541Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.4769319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.4769797Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5033499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5033962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5297456Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5297945Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5575517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5575977Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5842884Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.5843375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.6115577Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.6116069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:36.9654639Z ok (5.303s) 2022-05-18T04:38:36.9654858Z 2022-05-18T04:38:36.9655279Z ---------------------------------------------------------------------- 2022-05-18T04:38:36.9655603Z Ran 1 test in 5.304s 2022-05-18T04:38:36.9655769Z 2022-05-18T04:38:36.9655882Z OK 2022-05-18T04:38:36.9656017Z 2022-05-18T04:38:36.9656148Z Generating XML reports... 2022-05-18T04:38:36.9698587Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043831.xml 2022-05-18T04:38:38.1524540Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:38.1538563Z 2022-05-18T04:38:38.1538705Z Running tests... 2022-05-18T04:38:38.1539557Z ---------------------------------------------------------------------- 2022-05-18T04:38:39.7301289Z test_grad_layout_2devicemodule (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:39.7693975Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55113 2022-05-18T04:38:39.7802209Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55114 2022-05-18T04:38:40.6775263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:40.6808009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:40.8843915Z skip: Need at least 4 CUDA devices (2.730s) 2022-05-18T04:38:40.8844174Z 2022-05-18T04:38:40.8844576Z ---------------------------------------------------------------------- 2022-05-18T04:38:40.8845000Z Ran 1 test in 2.730s 2022-05-18T04:38:40.8845296Z 2022-05-18T04:38:40.8845394Z OK (skipped=1) 2022-05-18T04:38:40.8845553Z 2022-05-18T04:38:40.8845682Z Generating XML reports... 2022-05-18T04:38:40.8886995Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043838.xml 2022-05-18T04:38:42.0517616Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:42.0531318Z 2022-05-18T04:38:42.0531444Z Running tests... 2022-05-18T04:38:42.0532325Z ---------------------------------------------------------------------- 2022-05-18T04:38:43.6399491Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:43.6794411Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55216 2022-05-18T04:38:43.6902781Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55217 2022-05-18T04:38:44.5942178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:44.5946994Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5948129Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5949205Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5950270Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5951341Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5952395Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5987996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:44.5994471Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5995578Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5996632Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5997696Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5998759Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.5999899Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:38:44.7944869Z ok (2.741s) 2022-05-18T04:38:44.7945121Z 2022-05-18T04:38:44.7945537Z ---------------------------------------------------------------------- 2022-05-18T04:38:44.7945888Z Ran 1 test in 2.741s 2022-05-18T04:38:44.7946053Z 2022-05-18T04:38:44.7946147Z OK 2022-05-18T04:38:44.7946288Z 2022-05-18T04:38:44.7946411Z Generating XML reports... 2022-05-18T04:38:44.7988491Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043842.xml 2022-05-18T04:38:45.9609212Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:45.9623561Z 2022-05-18T04:38:45.9623889Z Running tests... 2022-05-18T04:38:45.9624332Z ---------------------------------------------------------------------- 2022-05-18T04:38:47.5481031Z test_multiple_outputs_multiple_backward (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:47.5862010Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55319 2022-05-18T04:38:47.5968815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55320 2022-05-18T04:38:48.4931273Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:48.5314006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:49.7760775Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyyurj1e7 2022-05-18T04:38:49.7761573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyyurj1e7/_remote_module_non_scriptable.py 2022-05-18T04:38:49.7797540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd6vi6huy 2022-05-18T04:38:49.7800318Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd6vi6huy/_remote_module_non_scriptable.py 2022-05-18T04:38:50.3047297Z ok (4.342s) 2022-05-18T04:38:50.3047574Z 2022-05-18T04:38:50.3047991Z ---------------------------------------------------------------------- 2022-05-18T04:38:50.3048330Z Ran 1 test in 4.342s 2022-05-18T04:38:50.3048818Z 2022-05-18T04:38:50.3048927Z OK 2022-05-18T04:38:50.3049068Z 2022-05-18T04:38:50.3049185Z Generating XML reports... 2022-05-18T04:38:50.3090519Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043845.xml 2022-05-18T04:38:51.4947500Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:51.4961034Z 2022-05-18T04:38:51.4961268Z Running tests... 2022-05-18T04:38:51.4962183Z ---------------------------------------------------------------------- 2022-05-18T04:38:53.0786897Z test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:53.1180643Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55439 2022-05-18T04:38:53.1290944Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55440 2022-05-18T04:38:54.0329111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:54.0355606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:55.2998202Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpso7_ou95 2022-05-18T04:38:55.2999232Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpso7_ou95/_remote_module_non_scriptable.py 2022-05-18T04:38:55.3283009Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwlqf97wy 2022-05-18T04:38:55.3285092Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwlqf97wy/_remote_module_non_scriptable.py 2022-05-18T04:38:55.8368706Z ok (4.340s) 2022-05-18T04:38:55.8369062Z 2022-05-18T04:38:55.8369720Z ---------------------------------------------------------------------- 2022-05-18T04:38:55.8370054Z Ran 1 test in 4.341s 2022-05-18T04:38:55.8370222Z 2022-05-18T04:38:55.8370317Z OK 2022-05-18T04:38:55.8370453Z 2022-05-18T04:38:55.8370614Z Generating XML reports... 2022-05-18T04:38:55.8414120Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043851.xml 2022-05-18T04:38:57.0161242Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:38:57.0174987Z 2022-05-18T04:38:57.0175489Z Running tests... 2022-05-18T04:38:57.0175981Z ---------------------------------------------------------------------- 2022-05-18T04:38:58.5491605Z test_nccl_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:58.5881720Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55559 2022-05-18T04:38:58.5987711Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55560 2022-05-18T04:38:59.4917952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:59.4968987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:00.7690122Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq7tr6m4q 2022-05-18T04:39:00.7690752Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq7tr6m4q/_remote_module_non_scriptable.py 2022-05-18T04:39:00.7959174Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppqu023cw 2022-05-18T04:39:00.7961674Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppqu023cw/_remote_module_non_scriptable.py 2022-05-18T04:39:01.0486517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:01.0487060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:01.3064969Z ok (4.289s) 2022-05-18T04:39:01.3065195Z 2022-05-18T04:39:01.3065598Z ---------------------------------------------------------------------- 2022-05-18T04:39:01.3066251Z Ran 1 test in 4.289s 2022-05-18T04:39:01.3066444Z 2022-05-18T04:39:01.3066524Z OK 2022-05-18T04:39:01.3066664Z 2022-05-18T04:39:01.3066798Z Generating XML reports... 2022-05-18T04:39:01.3109153Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043857.xml 2022-05-18T04:39:02.4995968Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:02.5009593Z 2022-05-18T04:39:02.5009924Z Running tests... 2022-05-18T04:39:02.5010343Z ---------------------------------------------------------------------- 2022-05-18T04:39:04.0559911Z test_nccl_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:04.0944664Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55679 2022-05-18T04:39:04.1050575Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55680 2022-05-18T04:39:05.0071391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:05.0434276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:06.2858188Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphkfl21ph 2022-05-18T04:39:06.2859263Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphkfl21ph/_remote_module_non_scriptable.py 2022-05-18T04:39:06.2920174Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_q8fp_c9 2022-05-18T04:39:06.2922909Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_q8fp_c9/_remote_module_non_scriptable.py 2022-05-18T04:39:06.5525489Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:06.5526019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:06.8128242Z ok (4.312s) 2022-05-18T04:39:06.8128456Z 2022-05-18T04:39:06.8128878Z ---------------------------------------------------------------------- 2022-05-18T04:39:06.8129201Z Ran 1 test in 4.312s 2022-05-18T04:39:06.8129379Z 2022-05-18T04:39:06.8129474Z OK 2022-05-18T04:39:06.8129610Z 2022-05-18T04:39:06.8129744Z Generating XML reports... 2022-05-18T04:39:06.8171626Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043902.xml 2022-05-18T04:39:08.0006528Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:08.0020530Z 2022-05-18T04:39:08.0021023Z Running tests... 2022-05-18T04:39:08.0021656Z ---------------------------------------------------------------------- 2022-05-18T04:39:09.5769344Z test_nccl_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:09.6158473Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55799 2022-05-18T04:39:09.6265492Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55800 2022-05-18T04:39:10.5365532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:10.5397886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:10.7307118Z skip: Need at least 4 CUDA devices (2.728s) 2022-05-18T04:39:10.7307428Z 2022-05-18T04:39:10.7307865Z ---------------------------------------------------------------------- 2022-05-18T04:39:10.7319733Z Ran 1 test in 2.729s 2022-05-18T04:39:10.7319954Z 2022-05-18T04:39:10.7320088Z OK (skipped=1) 2022-05-18T04:39:10.7320231Z 2022-05-18T04:39:10.7320360Z Generating XML reports... 2022-05-18T04:39:10.7351478Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043907.xml 2022-05-18T04:39:11.9113804Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:11.9129905Z 2022-05-18T04:39:11.9130428Z Running tests... 2022-05-18T04:39:11.9131371Z ---------------------------------------------------------------------- 2022-05-18T04:39:13.5031813Z test_nccl_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:13.5429962Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55902 2022-05-18T04:39:13.5536749Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55903 2022-05-18T04:39:14.4515960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:14.4849067Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:14.6578628Z skip: Need at least 8 CUDA devices (2.745s) 2022-05-18T04:39:14.6579005Z 2022-05-18T04:39:14.6579794Z ---------------------------------------------------------------------- 2022-05-18T04:39:14.6580389Z Ran 1 test in 2.745s 2022-05-18T04:39:14.6580571Z 2022-05-18T04:39:14.6580685Z OK (skipped=1) 2022-05-18T04:39:14.6580848Z 2022-05-18T04:39:14.6580957Z Generating XML reports... 2022-05-18T04:39:14.6624354Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043911.xml 2022-05-18T04:39:15.8346912Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:15.8360717Z 2022-05-18T04:39:15.8360923Z Running tests... 2022-05-18T04:39:15.8361439Z ---------------------------------------------------------------------- 2022-05-18T04:39:17.4179912Z test_nccl_backend_multi_device_ids_not_allowed (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:17.4575050Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56005 2022-05-18T04:39:17.4682789Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56006 2022-05-18T04:39:18.4130360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:18.4140014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:19.9758172Z ok (4.139s) 2022-05-18T04:39:19.9758480Z 2022-05-18T04:39:19.9759066Z ---------------------------------------------------------------------- 2022-05-18T04:39:19.9759437Z Ran 1 test in 4.140s 2022-05-18T04:39:19.9759603Z 2022-05-18T04:39:19.9759700Z OK 2022-05-18T04:39:19.9759837Z 2022-05-18T04:39:19.9759952Z Generating XML reports... 2022-05-18T04:39:19.9802234Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043915.xml 2022-05-18T04:39:21.1614709Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:21.1629059Z 2022-05-18T04:39:21.1629269Z Running tests... 2022-05-18T04:39:21.1629706Z ---------------------------------------------------------------------- 2022-05-18T04:39:22.7383945Z test_nccl_backend_multi_device_module_device_ids_None (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:22.7774996Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56114 2022-05-18T04:39:22.7882712Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56115 2022-05-18T04:39:23.7149381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:23.7409443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:23.8926095Z skip: Need at least 4 CUDA devices (2.729s) 2022-05-18T04:39:23.8926350Z 2022-05-18T04:39:23.8926751Z ---------------------------------------------------------------------- 2022-05-18T04:39:23.8927071Z Ran 1 test in 2.730s 2022-05-18T04:39:23.8927242Z 2022-05-18T04:39:23.8927357Z OK (skipped=1) 2022-05-18T04:39:23.8927514Z 2022-05-18T04:39:23.8927643Z Generating XML reports... 2022-05-18T04:39:23.8970560Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043921.xml 2022-05-18T04:39:25.0637694Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:25.0652004Z 2022-05-18T04:39:25.0652277Z Running tests... 2022-05-18T04:39:25.0652717Z ---------------------------------------------------------------------- 2022-05-18T04:39:26.6404173Z test_nccl_backend_single_device_module_device_ids_None (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:26.6796765Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56217 2022-05-18T04:39:26.6903374Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56218 2022-05-18T04:39:27.5879421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:27.6005465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:28.8801179Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvq1goq4y 2022-05-18T04:39:28.8802057Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvq1goq4y/_remote_module_non_scriptable.py 2022-05-18T04:39:28.8847980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp57k41zyc 2022-05-18T04:39:28.8851006Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp57k41zyc/_remote_module_non_scriptable.py 2022-05-18T04:39:29.1413363Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:29.1413891Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:29.4984352Z ok (4.433s) 2022-05-18T04:39:29.4984644Z 2022-05-18T04:39:29.4985217Z ---------------------------------------------------------------------- 2022-05-18T04:39:29.4985566Z Ran 1 test in 4.433s 2022-05-18T04:39:29.4985753Z 2022-05-18T04:39:29.4985851Z OK 2022-05-18T04:39:29.4985975Z 2022-05-18T04:39:29.4986109Z Generating XML reports... 2022-05-18T04:39:29.5028152Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043925.xml 2022-05-18T04:39:30.6728331Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:30.6743133Z 2022-05-18T04:39:30.6743606Z Running tests... 2022-05-18T04:39:30.6744081Z ---------------------------------------------------------------------- 2022-05-18T04:39:32.2099358Z test_nccl_backend_single_device_module_empty_device_ids (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:32.2486972Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56337 2022-05-18T04:39:32.2593406Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56338 2022-05-18T04:39:33.1641790Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:33.1644250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:34.4751382Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq_6_75xs 2022-05-18T04:39:34.4752240Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq_6_75xs/_remote_module_non_scriptable.py 2022-05-18T04:39:34.4971027Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy9zm6hcu 2022-05-18T04:39:34.4974148Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy9zm6hcu/_remote_module_non_scriptable.py 2022-05-18T04:39:34.7596821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:34.7597373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:35.0672605Z ok (4.393s) 2022-05-18T04:39:35.0672868Z 2022-05-18T04:39:35.0674242Z ---------------------------------------------------------------------- 2022-05-18T04:39:35.0674946Z Ran 1 test in 4.393s 2022-05-18T04:39:35.0675243Z 2022-05-18T04:39:35.0675381Z OK 2022-05-18T04:39:35.0675642Z 2022-05-18T04:39:35.0675880Z Generating XML reports... 2022-05-18T04:39:35.0718611Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043930.xml 2022-05-18T04:39:36.2494291Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:36.2508762Z 2022-05-18T04:39:36.2508968Z Running tests... 2022-05-18T04:39:36.2509388Z ---------------------------------------------------------------------- 2022-05-18T04:39:37.8422713Z test_nccl_propagate_error_reason (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:37.8814744Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56457 2022-05-18T04:39:37.8921821Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56458 2022-05-18T04:39:38.8047744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:38.8048281Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:56.2291048Z ok (19.978s) 2022-05-18T04:39:56.2291272Z 2022-05-18T04:39:56.2291994Z ---------------------------------------------------------------------- 2022-05-18T04:39:56.2292326Z Ran 1 test in 19.978s 2022-05-18T04:39:56.2294346Z 2022-05-18T04:39:56.2294750Z OK 2022-05-18T04:39:56.2294914Z 2022-05-18T04:39:56.2295059Z Generating XML reports... 2022-05-18T04:39:56.2335218Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043936.xml 2022-05-18T04:39:57.4076444Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:39:57.4090591Z 2022-05-18T04:39:57.4090837Z Running tests... 2022-05-18T04:39:57.4091431Z ---------------------------------------------------------------------- 2022-05-18T04:39:57.4111726Z test_no_grad (__main__.DistributedDataParallelTest) 2022-05-18T04:39:58.9893178Z Note: this test can be sped up by only running it on a CPU module ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:39:59.0284848Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56577 2022-05-18T04:39:59.0393219Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56578 2022-05-18T04:39:59.9450267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:59.9467617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:01.2077053Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoqd7bt1m 2022-05-18T04:40:01.2077903Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoqd7bt1m/_remote_module_non_scriptable.py 2022-05-18T04:40:01.2548611Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpalzvcou7 2022-05-18T04:40:01.2550794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpalzvcou7/_remote_module_non_scriptable.py 2022-05-18T04:40:01.7473042Z ok (4.338s) 2022-05-18T04:40:01.7473445Z 2022-05-18T04:40:01.7474102Z ---------------------------------------------------------------------- 2022-05-18T04:40:01.7474774Z Ran 1 test in 4.338s 2022-05-18T04:40:01.7475073Z 2022-05-18T04:40:01.7475251Z OK 2022-05-18T04:40:01.7475513Z 2022-05-18T04:40:01.7475711Z Generating XML reports... 2022-05-18T04:40:01.7519232Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043957.xml 2022-05-18T04:40:02.9397095Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:02.9410617Z 2022-05-18T04:40:02.9411139Z Running tests... 2022-05-18T04:40:02.9411957Z ---------------------------------------------------------------------- 2022-05-18T04:40:04.4808567Z test_param_layout_mismatch_error (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:04.5199866Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56693 2022-05-18T04:40:04.5307842Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56694 2022-05-18T04:40:05.4339695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:05.4737719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:06.7169228Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpas8jo0hj 2022-05-18T04:40:06.7170372Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpas8jo0hj/_remote_module_non_scriptable.py 2022-05-18T04:40:06.7270236Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy4_7ju_q 2022-05-18T04:40:06.7273211Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy4_7ju_q/_remote_module_non_scriptable.py 2022-05-18T04:40:07.1382209Z ok (4.197s) 2022-05-18T04:40:07.1382399Z 2022-05-18T04:40:07.1382807Z ---------------------------------------------------------------------- 2022-05-18T04:40:07.1383157Z Ran 1 test in 4.197s 2022-05-18T04:40:07.1383327Z 2022-05-18T04:40:07.1383889Z OK 2022-05-18T04:40:07.1384018Z 2022-05-18T04:40:07.1384147Z Generating XML reports... 2022-05-18T04:40:07.1426756Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044002.xml 2022-05-18T04:40:08.3109663Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:08.3123986Z 2022-05-18T04:40:08.3124469Z Running tests... 2022-05-18T04:40:08.3124965Z ---------------------------------------------------------------------- 2022-05-18T04:40:09.8961614Z test_pass_default_pg (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:09.9354183Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56809 2022-05-18T04:40:09.9461149Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56810 2022-05-18T04:40:10.8476451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:10.8480344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:10.8899405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:10.8904416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:10.8905505Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:10.8990954Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:11.0503412Z ok (2.738s) 2022-05-18T04:40:11.0503678Z 2022-05-18T04:40:11.0504559Z ---------------------------------------------------------------------- 2022-05-18T04:40:11.0504919Z Ran 1 test in 2.738s 2022-05-18T04:40:11.0505089Z 2022-05-18T04:40:11.0505188Z OK 2022-05-18T04:40:11.0505328Z 2022-05-18T04:40:11.0505463Z Generating XML reports... 2022-05-18T04:40:11.0548196Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044008.xml 2022-05-18T04:40:12.2303383Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:12.2317752Z 2022-05-18T04:40:12.2318095Z Running tests... 2022-05-18T04:40:12.2318547Z ---------------------------------------------------------------------- 2022-05-18T04:40:13.7984579Z test_powerSGD_ddp_comm_hook_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:13.8379960Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56916 2022-05-18T04:40:13.8490125Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56917 2022-05-18T04:40:14.7482049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:14.7483630Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:14.7853723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:14.7856571Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.0361384Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcwmye58j 2022-05-18T04:40:16.0362004Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcwmye58j/_remote_module_non_scriptable.py 2022-05-18T04:40:16.0373348Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuwcefxlv 2022-05-18T04:40:16.0376129Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuwcefxlv/_remote_module_non_scriptable.py 2022-05-18T04:40:16.1295646Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1296772Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1345126Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.1346217Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.1395711Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1396793Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1445638Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.1446733Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.1495240Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1496305Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1545033Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.1546246Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:16.1595486Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.1596570Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:16.4574798Z ok (4.225s) 2022-05-18T04:40:16.4575016Z 2022-05-18T04:40:16.4575424Z ---------------------------------------------------------------------- 2022-05-18T04:40:16.4575765Z Ran 1 test in 4.226s 2022-05-18T04:40:16.4575929Z 2022-05-18T04:40:16.4576032Z OK 2022-05-18T04:40:16.4576169Z 2022-05-18T04:40:16.4576299Z Generating XML reports... 2022-05-18T04:40:16.4620045Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044012.xml 2022-05-18T04:40:17.6447161Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:17.6460315Z 2022-05-18T04:40:17.6460441Z Running tests... 2022-05-18T04:40:17.6461148Z ---------------------------------------------------------------------- 2022-05-18T04:40:19.1898363Z test_powerSGD_ddp_comm_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:19.2284912Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57036 2022-05-18T04:40:19.2392065Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57037 2022-05-18T04:40:20.1396418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:20.1398367Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:20.1826854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:20.1829376Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.4205083Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnlyplm_5 2022-05-18T04:40:21.4206584Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnlyplm_5/_remote_module_non_scriptable.py 2022-05-18T04:40:21.4428444Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp417wfnr5 2022-05-18T04:40:21.4431467Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp417wfnr5/_remote_module_non_scriptable.py 2022-05-18T04:40:21.5328661Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5330107Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5377172Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.5378251Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.5426419Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5427589Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5476544Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.5477831Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.5525745Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5526954Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5575289Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.5576482Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:40:21.5624544Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.5625643Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:40:21.8468553Z ok (4.200s) 2022-05-18T04:40:21.8468770Z 2022-05-18T04:40:21.8469166Z ---------------------------------------------------------------------- 2022-05-18T04:40:21.8469675Z Ran 1 test in 4.201s 2022-05-18T04:40:21.8469907Z 2022-05-18T04:40:21.8469987Z OK 2022-05-18T04:40:21.8470125Z 2022-05-18T04:40:21.8470261Z Generating XML reports... 2022-05-18T04:40:21.8516643Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044017.xml 2022-05-18T04:40:23.0331591Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:23.0345800Z 2022-05-18T04:40:23.0346044Z Running tests... 2022-05-18T04:40:23.0346490Z ---------------------------------------------------------------------- 2022-05-18T04:40:24.6042618Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:24.6428287Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57156 2022-05-18T04:40:24.6533816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57157 2022-05-18T04:40:25.5736280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:25.6002908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:26.8698769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1k0t3jyh 2022-05-18T04:40:26.8699627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1k0t3jyh/_remote_module_non_scriptable.py 2022-05-18T04:40:26.8737309Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpggb8dco5 2022-05-18T04:40:26.8740229Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpggb8dco5/_remote_module_non_scriptable.py 2022-05-18T04:40:27.6599785Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:27.6600358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:27.9623830Z ok (4.927s) 2022-05-18T04:40:27.9624185Z 2022-05-18T04:40:27.9624934Z ---------------------------------------------------------------------- 2022-05-18T04:40:27.9625288Z Ran 1 test in 4.928s 2022-05-18T04:40:27.9625456Z 2022-05-18T04:40:27.9625551Z OK 2022-05-18T04:40:27.9627862Z 2022-05-18T04:40:27.9628327Z Generating XML reports... 2022-05-18T04:40:27.9667422Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044023.xml 2022-05-18T04:40:29.1512903Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:29.1526790Z 2022-05-18T04:40:29.1527074Z Running tests... 2022-05-18T04:40:29.1527497Z ---------------------------------------------------------------------- 2022-05-18T04:40:30.7165736Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:30.7551715Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57276 2022-05-18T04:40:30.7657304Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57277 2022-05-18T04:40:31.6785164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:31.6828626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:32.9435626Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4gu83e62 2022-05-18T04:40:32.9436238Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4gu83e62/_remote_module_non_scriptable.py 2022-05-18T04:40:32.9756093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphyj3f1nn 2022-05-18T04:40:32.9758803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphyj3f1nn/_remote_module_non_scriptable.py 2022-05-18T04:40:33.6209565Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:33.6210137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:33.8743805Z ok (4.721s) 2022-05-18T04:40:33.8744022Z 2022-05-18T04:40:33.8744864Z ---------------------------------------------------------------------- 2022-05-18T04:40:33.8745216Z Ran 1 test in 4.722s 2022-05-18T04:40:33.8745364Z 2022-05-18T04:40:33.8745472Z OK 2022-05-18T04:40:33.8745607Z 2022-05-18T04:40:33.8745739Z Generating XML reports... 2022-05-18T04:40:33.8787418Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044029.xml 2022-05-18T04:40:35.0664573Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:35.0678936Z 2022-05-18T04:40:35.0679170Z Running tests... 2022-05-18T04:40:35.0679629Z ---------------------------------------------------------------------- 2022-05-18T04:40:36.6301314Z test_invalid_nccl_blocking_wait_env (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:36.6696751Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57396 2022-05-18T04:40:36.6804325Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57397 2022-05-18T04:40:36.6915092Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57398 2022-05-18T04:40:37.5787671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:37.6254400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:40:37.6652980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:37.8961842Z skip: Need at least 3 CUDA devices (2.828s) 2022-05-18T04:40:37.8962095Z 2022-05-18T04:40:37.8962516Z ---------------------------------------------------------------------- 2022-05-18T04:40:37.8962860Z Ran 1 test in 2.828s 2022-05-18T04:40:37.8963008Z 2022-05-18T04:40:37.8963122Z OK (skipped=1) 2022-05-18T04:40:37.8963283Z 2022-05-18T04:40:37.8963409Z Generating XML reports... 2022-05-18T04:40:37.9005558Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044035.xml 2022-05-18T04:40:39.0678078Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:39.0692032Z 2022-05-18T04:40:39.0692457Z Running tests... 2022-05-18T04:40:39.0692965Z ---------------------------------------------------------------------- 2022-05-18T04:40:40.6513522Z test_nccl_blocking_wait_with_barrier (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:40.6910462Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57533 2022-05-18T04:40:40.7018627Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57534 2022-05-18T04:40:40.7129309Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57535 2022-05-18T04:40:41.6001918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:41.6609527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:41.6953399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:40:41.9176433Z skip: Need at least 3 CUDA devices (2.848s) 2022-05-18T04:40:41.9176693Z 2022-05-18T04:40:41.9177127Z ---------------------------------------------------------------------- 2022-05-18T04:40:41.9177451Z Ran 1 test in 2.848s 2022-05-18T04:40:41.9177617Z 2022-05-18T04:40:41.9177727Z OK (skipped=1) 2022-05-18T04:40:41.9177887Z 2022-05-18T04:40:41.9178013Z Generating XML reports... 2022-05-18T04:40:41.9225549Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044039.xml 2022-05-18T04:40:43.0889796Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:43.0904711Z 2022-05-18T04:40:43.0905110Z Running tests... 2022-05-18T04:40:43.0905620Z ---------------------------------------------------------------------- 2022-05-18T04:40:43.0912502Z test_nccl_errors_blocking_abort (__main__.NcclErrorHandlingTest) ... skip: Frequently times out see https://github.com/pytorch/pytorch/issues/58920 (0.001s) 2022-05-18T04:40:43.0912928Z 2022-05-18T04:40:43.0913297Z ---------------------------------------------------------------------- 2022-05-18T04:40:43.0913669Z Ran 1 test in 0.001s 2022-05-18T04:40:43.0913813Z 2022-05-18T04:40:43.0913927Z OK (skipped=1) 2022-05-18T04:40:43.0914083Z 2022-05-18T04:40:43.0914211Z Generating XML reports... 2022-05-18T04:40:43.0949321Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044043.xml 2022-05-18T04:40:44.1141886Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:44.1156792Z 2022-05-18T04:40:44.1157200Z Running tests... 2022-05-18T04:40:44.1157723Z ---------------------------------------------------------------------- 2022-05-18T04:40:45.6790325Z test_nccl_errors_blocking_clean_exit (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:45.7180234Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57703 2022-05-18T04:40:45.7287487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57704 2022-05-18T04:40:45.7397118Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57705 2022-05-18T04:40:46.6272955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:46.6406963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:46.6427560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:40:46.8444137Z skip: Need at least 3 CUDA devices (2.728s) 2022-05-18T04:40:46.8444459Z 2022-05-18T04:40:46.8444995Z ---------------------------------------------------------------------- 2022-05-18T04:40:46.8445321Z Ran 1 test in 2.729s 2022-05-18T04:40:46.8445488Z 2022-05-18T04:40:46.8445598Z OK (skipped=1) 2022-05-18T04:40:46.8445753Z 2022-05-18T04:40:46.8448377Z Generating XML reports... 2022-05-18T04:40:46.8489856Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044044.xml 2022-05-18T04:40:48.0254718Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:48.0268963Z 2022-05-18T04:40:48.0269171Z Running tests... 2022-05-18T04:40:48.0269625Z ---------------------------------------------------------------------- 2022-05-18T04:40:49.5996349Z test_nccl_errors_blocking_nonzero_exit (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:49.6392580Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57840 2022-05-18T04:40:49.6499591Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57841 2022-05-18T04:40:49.6609816Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57842 2022-05-18T04:40:50.5772991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:50.5789505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:40:50.6266101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:50.7657525Z ok (2.739s) 2022-05-18T04:40:50.7657920Z 2022-05-18T04:40:50.7658715Z ---------------------------------------------------------------------- 2022-05-18T04:40:50.7659109Z Ran 1 test in 2.739s 2022-05-18T04:40:50.7659294Z 2022-05-18T04:40:50.7659390Z OK 2022-05-18T04:40:50.7659526Z 2022-05-18T04:40:50.7659641Z Generating XML reports... 2022-05-18T04:40:50.7702921Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044048.xml 2022-05-18T04:40:51.9437296Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:51.9451006Z 2022-05-18T04:40:51.9451438Z Running tests... 2022-05-18T04:40:51.9451928Z ---------------------------------------------------------------------- 2022-05-18T04:40:53.5162120Z test_nccl_errors_blocking_sigkill (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:53.5550325Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57977 2022-05-18T04:40:53.5656725Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57978 2022-05-18T04:40:53.5766699Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 57979 2022-05-18T04:40:54.4554435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:54.4654359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:54.4697483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:40:54.6813665Z ok (2.736s) 2022-05-18T04:40:54.6813858Z 2022-05-18T04:40:54.6814391Z ---------------------------------------------------------------------- 2022-05-18T04:40:54.6814863Z Ran 1 test in 2.736s 2022-05-18T04:40:54.6815030Z 2022-05-18T04:40:54.6815340Z OK 2022-05-18T04:40:54.6815499Z 2022-05-18T04:40:54.6815612Z Generating XML reports... 2022-05-18T04:40:54.6856850Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044051.xml 2022-05-18T04:40:55.8506824Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:55.8521020Z 2022-05-18T04:40:55.8521318Z Running tests... 2022-05-18T04:40:55.8521760Z ---------------------------------------------------------------------- 2022-05-18T04:40:57.4447288Z test_nccl_errors_blocking_sigterm (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:40:57.4843713Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58114 2022-05-18T04:40:57.4952495Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58115 2022-05-18T04:40:57.5062182Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58116 2022-05-18T04:40:58.4102100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:40:58.4252434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:58.4650929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:58.6109706Z ok (2.759s) 2022-05-18T04:40:58.6110127Z 2022-05-18T04:40:58.6110545Z ---------------------------------------------------------------------- 2022-05-18T04:40:58.6110904Z Ran 1 test in 2.759s 2022-05-18T04:40:58.6111078Z 2022-05-18T04:40:58.6111155Z OK 2022-05-18T04:40:58.6111303Z 2022-05-18T04:40:58.6111437Z Generating XML reports... 2022-05-18T04:40:58.6154020Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044055.xml 2022-05-18T04:40:59.7806585Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:40:59.7820479Z 2022-05-18T04:40:59.7821079Z Running tests... 2022-05-18T04:40:59.7821734Z ---------------------------------------------------------------------- 2022-05-18T04:40:59.7836063Z test_nccl_errors_nonblocking (__main__.NcclErrorHandlingTest) ... skip: Test does not pass when run locally (0.001s) 2022-05-18T04:40:59.7836389Z 2022-05-18T04:40:59.7836677Z ---------------------------------------------------------------------- 2022-05-18T04:40:59.7837024Z Ran 1 test in 0.002s 2022-05-18T04:40:59.7837190Z 2022-05-18T04:40:59.7837283Z OK (skipped=1) 2022-05-18T04:40:59.7837438Z 2022-05-18T04:40:59.7837564Z Generating XML reports... 2022-05-18T04:40:59.7872263Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044059.xml 2022-05-18T04:41:00.8140789Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:00.8154985Z 2022-05-18T04:41:00.8155342Z Running tests... 2022-05-18T04:41:00.8155818Z ---------------------------------------------------------------------- 2022-05-18T04:41:02.3884283Z test_nccl_timeout (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:02.4272824Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58284 2022-05-18T04:41:02.4381093Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58285 2022-05-18T04:41:02.4491704Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 58286 2022-05-18T04:41:03.3393326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:03.3691306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:03.3757938Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:41:03.5535969Z skip: Need at least 3 CUDA devices (2.738s) 2022-05-18T04:41:03.5536401Z 2022-05-18T04:41:03.5537125Z ---------------------------------------------------------------------- 2022-05-18T04:41:03.5537492Z Ran 1 test in 2.738s 2022-05-18T04:41:03.5537641Z 2022-05-18T04:41:03.5537752Z OK (skipped=1) 2022-05-18T04:41:03.5537915Z 2022-05-18T04:41:03.5538043Z Generating XML reports... 2022-05-18T04:41:03.5578868Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044100.xml 2022-05-18T04:41:04.7195991Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:04.7209479Z 2022-05-18T04:41:04.7209845Z Running tests... 2022-05-18T04:41:04.7210340Z ---------------------------------------------------------------------- 2022-05-18T04:41:04.7216703Z test_init_no_gpus (__main__.ProcessGroupNCCLNoGPUTest) ... skip: GPUs are available, skipping test (0.001s) 2022-05-18T04:41:04.7217159Z 2022-05-18T04:41:04.7217431Z ---------------------------------------------------------------------- 2022-05-18T04:41:04.7217787Z Ran 1 test in 0.001s 2022-05-18T04:41:04.7217965Z 2022-05-18T04:41:04.7218077Z OK (skipped=1) 2022-05-18T04:41:04.7218232Z 2022-05-18T04:41:04.7218369Z Generating XML reports... 2022-05-18T04:41:04.7252958Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLNoGPUTest-20220518044104.xml 2022-05-18T04:41:05.7419823Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:05.7434246Z 2022-05-18T04:41:05.7434547Z Running tests... 2022-05-18T04:41:05.7434968Z ---------------------------------------------------------------------- 2022-05-18T04:41:07.3259631Z test_allgather_base_basics (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:07.3653515Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58454 2022-05-18T04:41:07.3762021Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58455 2022-05-18T04:41:08.3122540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:08.3124922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:08.3463433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:08.3467910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:08.3469582Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:08.3534094Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:09.8844737Z ok (4.141s) 2022-05-18T04:41:09.8844967Z 2022-05-18T04:41:09.8845358Z ---------------------------------------------------------------------- 2022-05-18T04:41:09.8845707Z Ran 1 test in 4.141s 2022-05-18T04:41:09.8845872Z 2022-05-18T04:41:09.8845967Z OK 2022-05-18T04:41:09.8846102Z 2022-05-18T04:41:09.8846265Z Generating XML reports... 2022-05-18T04:41:09.8890525Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044105.xml 2022-05-18T04:41:11.0855216Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:11.0869518Z 2022-05-18T04:41:11.0869950Z Running tests... 2022-05-18T04:41:11.0870423Z ---------------------------------------------------------------------- 2022-05-18T04:41:12.6574099Z test_allgather_base_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:12.6961363Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58563 2022-05-18T04:41:12.7069086Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58564 2022-05-18T04:41:13.5903748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:13.5905608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:13.6026996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:13.6030735Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:13.6031542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:13.6111046Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:15.2151230Z ok (4.128s) 2022-05-18T04:41:15.2151445Z 2022-05-18T04:41:15.2151862Z ---------------------------------------------------------------------- 2022-05-18T04:41:15.2152188Z Ran 1 test in 4.128s 2022-05-18T04:41:15.2152358Z 2022-05-18T04:41:15.2152452Z OK 2022-05-18T04:41:15.2152590Z 2022-05-18T04:41:15.2156519Z Generating XML reports... 2022-05-18T04:41:15.2196085Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044111.xml 2022-05-18T04:41:16.3843677Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:16.3857528Z 2022-05-18T04:41:16.3857786Z Running tests... 2022-05-18T04:41:16.3858239Z ---------------------------------------------------------------------- 2022-05-18T04:41:17.9657647Z test_allgather_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:18.0044651Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58679 2022-05-18T04:41:18.0151633Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58680 2022-05-18T04:41:18.9115282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:18.9117567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:18.9596120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:18.9599757Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:18.9600706Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:18.9627497Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:20.6238756Z ok (4.238s) 2022-05-18T04:41:20.6238975Z 2022-05-18T04:41:20.6239373Z ---------------------------------------------------------------------- 2022-05-18T04:41:20.6239713Z Ran 1 test in 4.238s 2022-05-18T04:41:20.6239881Z 2022-05-18T04:41:20.6239980Z OK 2022-05-18T04:41:20.6240116Z 2022-05-18T04:41:20.6240248Z Generating XML reports... 2022-05-18T04:41:20.6284237Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044116.xml 2022-05-18T04:41:21.8152711Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:21.8166085Z 2022-05-18T04:41:21.8166239Z Running tests... 2022-05-18T04:41:21.8166961Z ---------------------------------------------------------------------- 2022-05-18T04:41:23.3829482Z test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:23.4214923Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58795 2022-05-18T04:41:23.4322457Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58796 2022-05-18T04:41:24.3356680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:24.3358583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:24.3664822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:24.3669615Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:24.3671047Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:24.3767818Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:26.0406691Z ok (4.224s) 2022-05-18T04:41:26.0406909Z 2022-05-18T04:41:26.0407330Z ---------------------------------------------------------------------- 2022-05-18T04:41:26.0407656Z Ran 1 test in 4.224s 2022-05-18T04:41:26.0407829Z 2022-05-18T04:41:26.0407926Z OK 2022-05-18T04:41:26.0408061Z 2022-05-18T04:41:26.0408194Z Generating XML reports... 2022-05-18T04:41:26.0449815Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044121.xml 2022-05-18T04:41:27.2139591Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:27.2154150Z 2022-05-18T04:41:27.2154273Z Running tests... 2022-05-18T04:41:27.2154951Z ---------------------------------------------------------------------- 2022-05-18T04:41:28.8012627Z test_barrier (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:28.8399653Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58911 2022-05-18T04:41:28.8506785Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58912 2022-05-18T04:41:29.7495484Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:29.7497994Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:29.7667767Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:29.7672046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:29.7672833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:29.7702843Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:31.4590233Z ok (4.243s) 2022-05-18T04:41:31.4590459Z 2022-05-18T04:41:31.4590873Z ---------------------------------------------------------------------- 2022-05-18T04:41:31.4591201Z Ran 1 test in 4.244s 2022-05-18T04:41:31.4591371Z 2022-05-18T04:41:31.4591465Z OK 2022-05-18T04:41:31.4591603Z 2022-05-18T04:41:31.4591736Z Generating XML reports... 2022-05-18T04:41:31.4634400Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044127.xml 2022-05-18T04:41:32.6517091Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:32.6531158Z 2022-05-18T04:41:32.6531398Z Running tests... 2022-05-18T04:41:32.6532295Z ---------------------------------------------------------------------- 2022-05-18T04:41:34.2270377Z test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:34.2653895Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59027 2022-05-18T04:41:34.2761741Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59028 2022-05-18T04:41:35.1645192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:35.1647229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:35.1718442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:35.1722200Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:35.1723528Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:35.1750656Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:36.7835261Z ok (4.130s) 2022-05-18T04:41:36.7835534Z 2022-05-18T04:41:36.7835948Z ---------------------------------------------------------------------- 2022-05-18T04:41:36.7836310Z Ran 1 test in 4.130s 2022-05-18T04:41:36.7836478Z 2022-05-18T04:41:36.7836574Z OK 2022-05-18T04:41:36.7836720Z 2022-05-18T04:41:36.7836836Z Generating XML reports... 2022-05-18T04:41:36.7878788Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044132.xml 2022-05-18T04:41:37.9584310Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:37.9598158Z 2022-05-18T04:41:37.9598649Z Running tests... 2022-05-18T04:41:37.9599174Z ---------------------------------------------------------------------- 2022-05-18T04:41:39.5403886Z test_empty_tensors (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:39.5791668Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59143 2022-05-18T04:41:39.5898758Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59144 2022-05-18T04:41:40.4874444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:40.4876695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:40.4902823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:40.4906744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:40.4907787Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:40.4980083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:43.1997074Z ok (5.240s) 2022-05-18T04:41:43.1997303Z 2022-05-18T04:41:43.1998340Z ---------------------------------------------------------------------- 2022-05-18T04:41:43.1999022Z Ran 1 test in 5.240s 2022-05-18T04:41:43.1999351Z 2022-05-18T04:41:43.1999534Z OK 2022-05-18T04:41:43.1999793Z 2022-05-18T04:41:43.2000034Z Generating XML reports... 2022-05-18T04:41:43.2043138Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044137.xml 2022-05-18T04:41:44.3961012Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:44.3974404Z 2022-05-18T04:41:44.3974529Z Running tests... 2022-05-18T04:41:44.3975565Z ---------------------------------------------------------------------- 2022-05-18T04:41:45.9877914Z test_gather_checks (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:46.0273680Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59259 2022-05-18T04:41:46.0380105Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59260 2022-05-18T04:41:46.9406935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:46.9409367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:46.9476446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:46.9480037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:46.9481666Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:46.9512858Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:48.4452214Z ok (4.047s) 2022-05-18T04:41:48.4452684Z 2022-05-18T04:41:48.4453475Z ---------------------------------------------------------------------- 2022-05-18T04:41:48.4453896Z Ran 1 test in 4.048s 2022-05-18T04:41:48.4454063Z 2022-05-18T04:41:48.4454156Z OK 2022-05-18T04:41:48.4454294Z 2022-05-18T04:41:48.4454425Z Generating XML reports... 2022-05-18T04:41:48.4496088Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044144.xml 2022-05-18T04:41:49.6283232Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:49.6296962Z 2022-05-18T04:41:49.6297110Z Running tests... 2022-05-18T04:41:49.6297795Z ---------------------------------------------------------------------- 2022-05-18T04:41:51.1781605Z test_gather_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:51.2171477Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59368 2022-05-18T04:41:51.2278845Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59369 2022-05-18T04:41:52.1474464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:52.1477208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:52.1857284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:52.1861398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:52.1862414Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:52.1885148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:53.8355004Z ok (4.205s) 2022-05-18T04:41:53.8355225Z 2022-05-18T04:41:53.8355606Z ---------------------------------------------------------------------- 2022-05-18T04:41:53.8355952Z Ran 1 test in 4.206s 2022-05-18T04:41:53.8356123Z 2022-05-18T04:41:53.8356220Z OK 2022-05-18T04:41:53.8356357Z 2022-05-18T04:41:53.8356491Z Generating XML reports... 2022-05-18T04:41:53.8398573Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044149.xml 2022-05-18T04:41:54.9996093Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:41:55.0010122Z 2022-05-18T04:41:55.0010554Z Running tests... 2022-05-18T04:41:55.0011064Z ---------------------------------------------------------------------- 2022-05-18T04:41:56.5712963Z test_gather_stress (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:41:56.6110259Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59484 2022-05-18T04:41:56.6217368Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59485 2022-05-18T04:41:57.5436731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:57.5439171Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:57.5601409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:57.5605176Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:57.5605984Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:57.5644240Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:01.8342784Z ok (6.833s) 2022-05-18T04:42:01.8343193Z 2022-05-18T04:42:01.8344704Z ---------------------------------------------------------------------- 2022-05-18T04:42:01.8345086Z Ran 1 test in 6.833s 2022-05-18T04:42:01.8345256Z 2022-05-18T04:42:01.8345354Z OK 2022-05-18T04:42:01.8345469Z 2022-05-18T04:42:01.8345603Z Generating XML reports... 2022-05-18T04:42:01.8386917Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044154.xml 2022-05-18T04:42:03.0490336Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:03.0504295Z 2022-05-18T04:42:03.0504538Z Running tests... 2022-05-18T04:42:03.0504975Z ---------------------------------------------------------------------- 2022-05-18T04:42:04.6365682Z test_reduce_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:04.6763110Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59600 2022-05-18T04:42:04.6872711Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59601 2022-05-18T04:42:05.5702015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:05.5704253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:05.5823371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:05.5827600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:05.5828418Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:05.5909323Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:07.1946041Z ok (4.144s) 2022-05-18T04:42:07.1946248Z 2022-05-18T04:42:07.1946879Z ---------------------------------------------------------------------- 2022-05-18T04:42:07.1947270Z Ran 1 test in 4.144s 2022-05-18T04:42:07.1947421Z 2022-05-18T04:42:07.1947517Z OK 2022-05-18T04:42:07.1948470Z 2022-05-18T04:42:07.1950303Z Generating XML reports... 2022-05-18T04:42:07.1991471Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044203.xml 2022-05-18T04:42:08.3781214Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:08.3796041Z 2022-05-18T04:42:08.3796502Z Running tests... 2022-05-18T04:42:08.3796995Z ---------------------------------------------------------------------- 2022-05-18T04:42:09.9553700Z test_reduce_scatter_base_basics (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:09.9950117Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59716 2022-05-18T04:42:10.0058072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59717 2022-05-18T04:42:10.8963776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:10.8966229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:10.9028773Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:10.9032492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:10.9033640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:10.9069443Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:12.4129396Z ok (4.033s) 2022-05-18T04:42:12.4129618Z 2022-05-18T04:42:12.4130029Z ---------------------------------------------------------------------- 2022-05-18T04:42:12.4130372Z Ran 1 test in 4.033s 2022-05-18T04:42:12.4130520Z 2022-05-18T04:42:12.4130624Z OK 2022-05-18T04:42:12.4131092Z 2022-05-18T04:42:12.4131245Z Generating XML reports... 2022-05-18T04:42:12.4173814Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044208.xml 2022-05-18T04:42:13.6035192Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:13.6048866Z 2022-05-18T04:42:13.6049315Z Running tests... 2022-05-18T04:42:13.6049997Z ---------------------------------------------------------------------- 2022-05-18T04:42:15.1757949Z test_reduce_scatter_base_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:15.2153099Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59825 2022-05-18T04:42:15.2260143Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59826 2022-05-18T04:42:16.1557331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:16.1560619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:16.1626366Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:16.1630046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:16.1631155Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:16.1664475Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:17.8336440Z ok (4.228s) 2022-05-18T04:42:17.8336681Z 2022-05-18T04:42:17.8337094Z ---------------------------------------------------------------------- 2022-05-18T04:42:17.8337423Z Ran 1 test in 4.229s 2022-05-18T04:42:17.8337588Z 2022-05-18T04:42:17.8337685Z OK 2022-05-18T04:42:17.8337821Z 2022-05-18T04:42:17.8337955Z Generating XML reports... 2022-05-18T04:42:17.8380515Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044213.xml 2022-05-18T04:42:19.0206269Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:19.0219893Z 2022-05-18T04:42:19.0220281Z Running tests... 2022-05-18T04:42:19.0220810Z ---------------------------------------------------------------------- 2022-05-18T04:42:20.5855603Z test_reduce_scatter_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:20.6248357Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59941 2022-05-18T04:42:20.6356467Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59942 2022-05-18T04:42:21.5603688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:21.5606300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:21.5887594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:21.5891164Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:21.5891981Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:21.5912814Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:23.2433347Z ok (4.221s) 2022-05-18T04:42:23.2433558Z 2022-05-18T04:42:23.2433964Z ---------------------------------------------------------------------- 2022-05-18T04:42:23.2434319Z Ran 1 test in 4.221s 2022-05-18T04:42:23.2434486Z 2022-05-18T04:42:23.2434583Z OK 2022-05-18T04:42:23.2434707Z 2022-05-18T04:42:23.2434844Z Generating XML reports... 2022-05-18T04:42:23.2477518Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044219.xml 2022-05-18T04:42:24.4143203Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:24.4157628Z 2022-05-18T04:42:24.4157784Z Running tests... 2022-05-18T04:42:24.4158227Z ---------------------------------------------------------------------- 2022-05-18T04:42:25.9989535Z test_scatter_checks (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:26.0384750Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60057 2022-05-18T04:42:26.0491753Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60058 2022-05-18T04:42:26.9896418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:26.9898391Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:27.0000049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:27.0003726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:27.0004809Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:27.0103219Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:28.5566037Z ok (4.141s) 2022-05-18T04:42:28.5566265Z 2022-05-18T04:42:28.5566672Z ---------------------------------------------------------------------- 2022-05-18T04:42:28.5567001Z Ran 1 test in 4.141s 2022-05-18T04:42:28.5567175Z 2022-05-18T04:42:28.5567273Z OK 2022-05-18T04:42:28.5567411Z 2022-05-18T04:42:28.5567545Z Generating XML reports... 2022-05-18T04:42:28.5610052Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044224.xml 2022-05-18T04:42:29.7312305Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:29.7325691Z 2022-05-18T04:42:29.7325831Z Running tests... 2022-05-18T04:42:29.7326578Z ---------------------------------------------------------------------- 2022-05-18T04:42:31.2874910Z test_scatter_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:31.3260696Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60166 2022-05-18T04:42:31.3368690Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60167 2022-05-18T04:42:32.2225921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:32.2229230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:32.2303619Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:32.2307897Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:32.2308979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:32.2332078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:33.8441863Z ok (4.111s) 2022-05-18T04:42:33.8442291Z 2022-05-18T04:42:33.8442800Z ---------------------------------------------------------------------- 2022-05-18T04:42:33.8443168Z Ran 1 test in 4.112s 2022-05-18T04:42:33.8443344Z 2022-05-18T04:42:33.8443421Z OK 2022-05-18T04:42:33.8443557Z 2022-05-18T04:42:33.8443692Z Generating XML reports... 2022-05-18T04:42:33.8486458Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044229.xml 2022-05-18T04:42:35.0021248Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:35.0035518Z 2022-05-18T04:42:35.0035676Z Running tests... 2022-05-18T04:42:35.0036105Z ---------------------------------------------------------------------- 2022-05-18T04:42:36.5434559Z test_scatter_stress (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:36.5820221Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60282 2022-05-18T04:42:36.5926851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60283 2022-05-18T04:42:37.4951523Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:37.4954022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:37.5300918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:37.5304554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:37.5305380Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:37.5362424Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:41.9053949Z ok (6.902s) 2022-05-18T04:42:41.9054186Z 2022-05-18T04:42:41.9054963Z ---------------------------------------------------------------------- 2022-05-18T04:42:41.9055311Z Ran 1 test in 6.902s 2022-05-18T04:42:41.9055477Z 2022-05-18T04:42:41.9055575Z OK 2022-05-18T04:42:41.9055693Z 2022-05-18T04:42:41.9055831Z Generating XML reports... 2022-05-18T04:42:41.9098909Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044234.xml 2022-05-18T04:42:43.0791437Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:43.0804901Z 2022-05-18T04:42:43.0805146Z Running tests... 2022-05-18T04:42:43.0805607Z ---------------------------------------------------------------------- 2022-05-18T04:42:44.6206342Z test_common_errors (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:44.6369760Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:44.6370866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:42:44.6392305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:44.6393085Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:42:44.6411094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:44.6412251Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:42:44.6430534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:44.6431644Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:42:44.6496088Z ok (1.569s) 2022-05-18T04:42:44.6497048Z 2022-05-18T04:42:44.6497562Z ---------------------------------------------------------------------- 2022-05-18T04:42:44.6497918Z Ran 1 test in 1.569s 2022-05-18T04:42:44.6498066Z 2022-05-18T04:42:44.6498163Z OK 2022-05-18T04:42:44.6498301Z 2022-05-18T04:42:44.6498431Z Generating XML reports... 2022-05-18T04:42:44.6530313Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-RendezvousEnvTest-20220518044243.xml 2022-05-18T04:42:45.7820113Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:42:45.7834700Z 2022-05-18T04:42:45.7835065Z Running tests... 2022-05-18T04:42:45.7836003Z ---------------------------------------------------------------------- 2022-05-18T04:42:47.3598556Z test_default_store_timeout_nccl (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:47.3756497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:47.3757289Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:42:49.3850454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:49.3851447Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:42:50.4062959Z ok (4.623s) 2022-05-18T04:42:50.4063680Z 2022-05-18T04:42:50.4064351Z ---------------------------------------------------------------------- 2022-05-18T04:42:50.4064722Z Ran 1 test in 4.623s 2022-05-18T04:42:50.4064895Z 2022-05-18T04:42:50.4065006Z OK 2022-05-18T04:42:50.4065142Z 2022-05-18T04:42:50.4065272Z Generating XML reports... 2022-05-18T04:42:50.4098306Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-TimeoutTest-20220518044245.xml 2022-05-18T04:42:50.7905118Z Running distributed/test_c10d_gloo ... [2022-05-18 04:42:50.789929] 2022-05-18T04:42:50.7906193Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:42:50.790040] 2022-05-18T04:42:51.7094173Z , <__main__.CommTest testMethod=test_broadcast_coalesced_gloo_cuda>, <__main__.CommTest testMethod=test_gloo_barrier_device_ids>, <__main__.CommTest testMethod=test_gloo_warn_not_in_group>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_gloo>, <__main__.CommTest testMethod=test_sequence_num_set_gloo_new_group>]> 2022-05-18T04:42:51.7095878Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) 2022-05-18T04:42:51.7096272Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) 2022-05-18T04:42:51.7096816Z test_gloo_barrier_device_ids (__main__.CommTest) 2022-05-18T04:42:51.7097367Z test_gloo_warn_not_in_group (__main__.CommTest) 2022-05-18T04:42:51.7098003Z test_sequence_num_incremented_gloo_default (__main__.CommTest) 2022-05-18T04:42:51.7098691Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) 2022-05-18T04:42:51.7099379Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) 2022-05-18T04:42:51.7099912Z test_sequence_num_set_gloo_new_group (__main__.CommTest) 2022-05-18T04:42:51.7106929Z , <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_cpu>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_gloo>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_register_just_once>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_init>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_return_type>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_when_unused_parameters_empty>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output_with_unused_parameters>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_save_load_checkpoint>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2022-05-18T04:42:51.7113159Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7113776Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7114278Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7114795Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7115471Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7116003Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7116569Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7117174Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7117655Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7118158Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7118859Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7119377Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7119942Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7120611Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7121080Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7121591Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7122236Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7122684Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7123110Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7123778Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7124278Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7124753Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7125286Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7125919Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7126426Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7126874Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7127505Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7127936Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7128456Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7129049Z test_ignored_output (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7129504Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7129955Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7130357Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7130962Z test_sparse_gradients (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7131393Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7131837Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7132347Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T04:42:51.7132825Z 2022-05-18T04:42:51.7138582Z , <__main__.ProcessGroupGlooTest testMethod=test_allgather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_barrier_implies_wait>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_checks>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_empty_tensors>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_gather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_gather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_multi_device_constructor>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin_create_destroy>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_checks>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_send_recv_all_to_all>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_checks>]> 2022-05-18T04:42:51.7144828Z test_allgather_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7145230Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7145615Z test_allgather_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7145981Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7146577Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7146992Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7147367Z test_allgather_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7147750Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7148304Z test_allreduce_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7148664Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7149074Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7149499Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7150068Z test_allreduce_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7150443Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7150840Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7151235Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7151856Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7152270Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7152656Z test_allreduce_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7153014Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7153584Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7153962Z test_broadcast_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7154340Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7154691Z test_broadcast_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7155242Z test_broadcast_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7155720Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7156085Z test_empty_tensors (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7156446Z test_gather_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7157002Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7157354Z test_gather_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7157743Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7158120Z test_gather_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7158646Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7159039Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7159411Z test_reduce_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7159773Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7160220Z test_reduce_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7160672Z test_reduce_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7161040Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7161383Z test_round_robin (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7161762Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7162313Z test_scatter_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7162763Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7163132Z test_scatter_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7163496Z test_scatter_stress (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7164057Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7164418Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7164803Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7165202Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7165777Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) 2022-05-18T04:42:51.7166642Z , <__main__.ReducerTest testMethod=test_forward_backward_optimizer>, <__main__.ReducerTest testMethod=test_forward_backward_unused_parameters>, <__main__.ReducerTest testMethod=test_multi_dtype_multi_bucket>, <__main__.ReducerTest testMethod=test_multi_dtype_single_bucket>, <__main__.ReducerTest testMethod=test_single_dtype_single_bucket>]> 2022-05-18T04:42:51.7167654Z test_forward_backward (__main__.ReducerTest) 2022-05-18T04:42:51.7168002Z test_forward_backward_optimizer (__main__.ReducerTest) 2022-05-18T04:42:51.7168358Z test_forward_backward_unused_parameters (__main__.ReducerTest) 2022-05-18T04:42:51.7168730Z test_multi_dtype_multi_bucket (__main__.ReducerTest) 2022-05-18T04:42:51.7169257Z test_multi_dtype_single_bucket (__main__.ReducerTest) 2022-05-18T04:42:51.7169613Z test_single_dtype_single_bucket (__main__.ReducerTest) 2022-05-18T04:42:51.7170026Z ]> 2022-05-18T04:42:51.7170439Z test_logging_init (__main__.RendezvousEnvTest) 2022-05-18T04:42:51.7170943Z 2022-05-18T04:42:51.7171349Z ]> 2022-05-18T04:42:51.7171777Z test_default_store_timeout_gloo (__main__.TimeoutTest) 2022-05-18T04:42:52.6124059Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:42:52.6137186Z 2022-05-18T04:42:52.6137343Z Running tests... 2022-05-18T04:42:52.6138448Z ---------------------------------------------------------------------- 2022-05-18T04:42:54.1900574Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:54.2292239Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60523 2022-05-18T04:42:54.2399535Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60524 2022-05-18T04:42:55.1793472Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:55.1812088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:55.4442676Z ok (2.830s) 2022-05-18T04:42:55.4442900Z 2022-05-18T04:42:55.4443307Z ---------------------------------------------------------------------- 2022-05-18T04:42:55.4443654Z Ran 1 test in 2.830s 2022-05-18T04:42:55.4443820Z 2022-05-18T04:42:55.4443918Z OK 2022-05-18T04:42:55.4444054Z 2022-05-18T04:42:55.4444168Z Generating XML reports... 2022-05-18T04:42:55.4486608Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044252.xml 2022-05-18T04:42:56.6213291Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:42:56.6227153Z 2022-05-18T04:42:56.6227290Z Running tests... 2022-05-18T04:42:56.6227777Z ---------------------------------------------------------------------- 2022-05-18T04:42:58.2014486Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:42:58.2403100Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60632 2022-05-18T04:42:58.2511043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60633 2022-05-18T04:42:59.1534998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:59.1667725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:00.7584740Z ok (4.135s) 2022-05-18T04:43:00.7585105Z 2022-05-18T04:43:00.7585806Z ---------------------------------------------------------------------- 2022-05-18T04:43:00.7586422Z Ran 1 test in 4.136s 2022-05-18T04:43:00.7586733Z 2022-05-18T04:43:00.7586911Z OK 2022-05-18T04:43:00.7587150Z 2022-05-18T04:43:00.7587381Z Generating XML reports... 2022-05-18T04:43:00.7630432Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044256.xml 2022-05-18T04:43:01.9569738Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:01.9583267Z 2022-05-18T04:43:01.9583530Z Running tests... 2022-05-18T04:43:01.9584314Z ---------------------------------------------------------------------- 2022-05-18T04:43:03.5402628Z test_gloo_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:03.5798495Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60743 2022-05-18T04:43:03.5907232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60744 2022-05-18T04:43:04.4952602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:04.5343000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:04.5466241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:04.5466766Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:04.5467556Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:04.5468252Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:04.7951536Z ok (2.836s) 2022-05-18T04:43:04.7951741Z 2022-05-18T04:43:04.7952149Z ---------------------------------------------------------------------- 2022-05-18T04:43:04.7952476Z Ran 1 test in 2.837s 2022-05-18T04:43:04.7952645Z 2022-05-18T04:43:04.7952741Z OK 2022-05-18T04:43:04.7952876Z 2022-05-18T04:43:04.7953007Z Generating XML reports... 2022-05-18T04:43:04.7994292Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044301.xml 2022-05-18T04:43:05.9748891Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:05.9762902Z 2022-05-18T04:43:05.9763048Z Running tests... 2022-05-18T04:43:05.9763493Z ---------------------------------------------------------------------- 2022-05-18T04:43:07.5657631Z test_gloo_warn_not_in_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:07.6044842Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60852 2022-05-18T04:43:07.6154801Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60853 2022-05-18T04:43:08.5100643Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:08.5127216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:08.5310356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:08.5310915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:08.5311697Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:08.5312399Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:08.5316668Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:08.5414043Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:08.5414716Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:08.5419628Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:10.1228949Z ok (4.146s) 2022-05-18T04:43:10.1229208Z 2022-05-18T04:43:10.1229884Z ---------------------------------------------------------------------- 2022-05-18T04:43:10.1230278Z Ran 1 test in 4.147s 2022-05-18T04:43:10.1230444Z 2022-05-18T04:43:10.1230542Z OK 2022-05-18T04:43:10.1230658Z 2022-05-18T04:43:10.1230796Z Generating XML reports... 2022-05-18T04:43:10.1271809Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044305.xml 2022-05-18T04:43:11.2968869Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:11.2982803Z 2022-05-18T04:43:11.2983092Z Running tests... 2022-05-18T04:43:11.2983535Z ---------------------------------------------------------------------- 2022-05-18T04:43:12.8724174Z test_sequence_num_incremented_gloo_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:12.9120153Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60966 2022-05-18T04:43:12.9228256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60967 2022-05-18T04:43:13.8159317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:13.8542838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:13.8677606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:13.8678113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:13.8678911Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:13.8679609Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:13.8786076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:13.8786596Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:13.8787279Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:13.8788498Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:15.4301726Z ok (4.132s) 2022-05-18T04:43:15.4301941Z 2022-05-18T04:43:15.4302341Z ---------------------------------------------------------------------- 2022-05-18T04:43:15.4302667Z Ran 1 test in 4.132s 2022-05-18T04:43:15.4302837Z 2022-05-18T04:43:15.4302938Z OK 2022-05-18T04:43:15.4303077Z 2022-05-18T04:43:15.4303212Z Generating XML reports... 2022-05-18T04:43:15.4345747Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044311.xml 2022-05-18T04:43:16.6279017Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:16.6292988Z 2022-05-18T04:43:16.6293131Z Running tests... 2022-05-18T04:43:16.6293839Z ---------------------------------------------------------------------- 2022-05-18T04:43:18.2184270Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:18.2581649Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61083 2022-05-18T04:43:18.2689568Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61084 2022-05-18T04:43:19.1836887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:19.2140117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:19.3732251Z skip: Need at least 4 CUDA devices (2.744s) 2022-05-18T04:43:19.3732607Z 2022-05-18T04:43:19.3733175Z ---------------------------------------------------------------------- 2022-05-18T04:43:19.3733530Z Ran 1 test in 2.744s 2022-05-18T04:43:19.3733678Z 2022-05-18T04:43:19.3733793Z OK (skipped=1) 2022-05-18T04:43:19.3733951Z 2022-05-18T04:43:19.3734079Z Generating XML reports... 2022-05-18T04:43:19.3776092Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044316.xml 2022-05-18T04:43:20.5277241Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:20.5292088Z 2022-05-18T04:43:20.5292486Z Running tests... 2022-05-18T04:43:20.5293006Z ---------------------------------------------------------------------- 2022-05-18T04:43:22.0942595Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:22.1330674Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61186 2022-05-18T04:43:22.1438146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61187 2022-05-18T04:43:23.0318406Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:23.0402389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:23.0528191Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:23.0528730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:23.0529524Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:23.0530209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:23.2479807Z ok (2.718s) 2022-05-18T04:43:23.2480015Z 2022-05-18T04:43:23.2480407Z ---------------------------------------------------------------------- 2022-05-18T04:43:23.2480731Z Ran 1 test in 2.719s 2022-05-18T04:43:23.2481196Z 2022-05-18T04:43:23.2481306Z OK 2022-05-18T04:43:23.2481457Z 2022-05-18T04:43:23.2481592Z Generating XML reports... 2022-05-18T04:43:23.2523240Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044320.xml 2022-05-18T04:43:24.3680784Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:24.3694501Z 2022-05-18T04:43:24.3694808Z Running tests... 2022-05-18T04:43:24.3695269Z ---------------------------------------------------------------------- 2022-05-18T04:43:25.9538587Z test_sequence_num_set_gloo_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:25.9927161Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61295 2022-05-18T04:43:26.0034751Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61296 2022-05-18T04:43:26.9142734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:26.9143271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:26.9253662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:26.9254183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:26.9255193Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:26.9255892Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:26.9463328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:26.9463824Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:26.9464690Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:26.9465381Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:27.2078309Z ok (2.838s) 2022-05-18T04:43:27.2078510Z 2022-05-18T04:43:27.2079119Z ---------------------------------------------------------------------- 2022-05-18T04:43:27.2079468Z Ran 1 test in 2.838s 2022-05-18T04:43:27.2079634Z 2022-05-18T04:43:27.2079735Z OK 2022-05-18T04:43:27.2079875Z 2022-05-18T04:43:27.2080004Z Generating XML reports... 2022-05-18T04:43:27.2121931Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044324.xml 2022-05-18T04:43:28.3547207Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:28.3561227Z 2022-05-18T04:43:28.3561470Z Running tests... 2022-05-18T04:43:28.3561924Z ---------------------------------------------------------------------- 2022-05-18T04:43:28.3569966Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T04:43:29.9305928Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:29.9694163Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61410 2022-05-18T04:43:29.9801113Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61411 2022-05-18T04:43:30.8822863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:30.9151766Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:32.2229727Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnksb3vqw 2022-05-18T04:43:32.2230338Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnksb3vqw/_remote_module_non_scriptable.py 2022-05-18T04:43:32.2435886Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4_a8ba5i 2022-05-18T04:43:32.2438661Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4_a8ba5i/_remote_module_non_scriptable.py 2022-05-18T04:43:32.7878587Z ok (4.431s) 2022-05-18T04:43:32.7878898Z 2022-05-18T04:43:32.7879441Z ---------------------------------------------------------------------- 2022-05-18T04:43:32.7879782Z Ran 1 test in 4.432s 2022-05-18T04:43:32.7879946Z 2022-05-18T04:43:32.7880039Z OK 2022-05-18T04:43:32.7880171Z 2022-05-18T04:43:32.7880299Z Generating XML reports... 2022-05-18T04:43:32.7923795Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044328.xml 2022-05-18T04:43:33.9816482Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:33.9830695Z 2022-05-18T04:43:33.9831128Z Running tests... 2022-05-18T04:43:33.9831626Z ---------------------------------------------------------------------- 2022-05-18T04:43:33.9839563Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:43:35.5582664Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:35.5978250Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61525 2022-05-18T04:43:35.6085869Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61526 2022-05-18T04:43:36.5113003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:36.5450658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:37.8642169Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb_z5fggv 2022-05-18T04:43:37.8643328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb_z5fggv/_remote_module_non_scriptable.py 2022-05-18T04:43:37.8657929Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8w3zwmx7 2022-05-18T04:43:37.8660883Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8w3zwmx7/_remote_module_non_scriptable.py 2022-05-18T04:43:38.4165820Z ok (4.433s) 2022-05-18T04:43:38.4166042Z 2022-05-18T04:43:38.4166717Z ---------------------------------------------------------------------- 2022-05-18T04:43:38.4167490Z Ran 1 test in 4.433s 2022-05-18T04:43:38.4167777Z 2022-05-18T04:43:38.4167871Z OK 2022-05-18T04:43:38.4168011Z 2022-05-18T04:43:38.4168146Z Generating XML reports... 2022-05-18T04:43:38.4213321Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044333.xml 2022-05-18T04:43:39.6203021Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:39.6216770Z 2022-05-18T04:43:39.6217169Z Running tests... 2022-05-18T04:43:39.6217663Z ---------------------------------------------------------------------- 2022-05-18T04:43:39.6228299Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:43:41.1935347Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:41.2332670Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61640 2022-05-18T04:43:41.2441479Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61641 2022-05-18T04:43:42.1703751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:42.1886595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:43.4676980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpidssfcyq 2022-05-18T04:43:43.4677905Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpidssfcyq/_remote_module_non_scriptable.py 2022-05-18T04:43:43.4898867Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1fmoiffd 2022-05-18T04:43:43.4901514Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1fmoiffd/_remote_module_non_scriptable.py 2022-05-18T04:43:43.6823351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.6824319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7138134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7138867Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7314833Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:43:43.7315634Z warnings.warn( 2022-05-18T04:43:43.7316718Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:43:43.7317696Z warnings.warn( 2022-05-18T04:43:43.7417939Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7418999Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7642163Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7643194Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7952284Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.7953313Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.8223354Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:43.8224521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:44.1523231Z ok (4.530s) 2022-05-18T04:43:44.1523631Z 2022-05-18T04:43:44.1524364Z ---------------------------------------------------------------------- 2022-05-18T04:43:44.1524828Z Ran 1 test in 4.531s 2022-05-18T04:43:44.1524995Z 2022-05-18T04:43:44.1525091Z OK 2022-05-18T04:43:44.1525225Z 2022-05-18T04:43:44.1525358Z Generating XML reports... 2022-05-18T04:43:44.1567842Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044339.xml 2022-05-18T04:43:45.3358168Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:45.3371062Z 2022-05-18T04:43:45.3371505Z Running tests... 2022-05-18T04:43:45.3382436Z ---------------------------------------------------------------------- 2022-05-18T04:43:45.3383228Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:43:46.9006637Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:46.9399690Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61755 2022-05-18T04:43:46.9508752Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61756 2022-05-18T04:43:47.8434193Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:47.8729156Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:49.1720204Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_p990ei9 2022-05-18T04:43:49.1720850Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_p990ei9/_remote_module_non_scriptable.py 2022-05-18T04:43:49.1725358Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi2aaw1ce 2022-05-18T04:43:49.1728309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi2aaw1ce/_remote_module_non_scriptable.py 2022-05-18T04:43:49.3658834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.3659389Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.3987496Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.3987976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.4154641Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:43:49.4155401Z warnings.warn( 2022-05-18T04:43:49.4156445Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:43:49.4157468Z warnings.warn( 2022-05-18T04:43:49.4270727Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.4271232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.4497651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.4498145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.4821490Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.4822033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.5096893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.5097402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:49.8590100Z ok (4.522s) 2022-05-18T04:43:49.8590308Z 2022-05-18T04:43:49.8590728Z ---------------------------------------------------------------------- 2022-05-18T04:43:49.8591053Z Ran 1 test in 4.522s 2022-05-18T04:43:49.8591220Z 2022-05-18T04:43:49.8591318Z OK 2022-05-18T04:43:49.8591456Z 2022-05-18T04:43:49.8591612Z Generating XML reports... 2022-05-18T04:43:49.8634305Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044345.xml 2022-05-18T04:43:51.0613412Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:51.0627438Z 2022-05-18T04:43:51.0627601Z Running tests... 2022-05-18T04:43:51.0628846Z ---------------------------------------------------------------------- 2022-05-18T04:43:51.0637158Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:43:52.6296292Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:52.6681940Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61870 2022-05-18T04:43:52.6790827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61871 2022-05-18T04:43:53.5909354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:53.6256002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:54.9375182Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpasigmcts 2022-05-18T04:43:54.9376191Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpasigmcts/_remote_module_non_scriptable.py 2022-05-18T04:43:54.9675917Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeg63eub4 2022-05-18T04:43:54.9678933Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeg63eub4/_remote_module_non_scriptable.py 2022-05-18T04:43:55.1666204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:55.1666731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:55.1979003Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:55.1979753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:43:55.4870011Z ok (4.424s) 2022-05-18T04:43:55.4870218Z 2022-05-18T04:43:55.4870768Z ---------------------------------------------------------------------- 2022-05-18T04:43:55.4871121Z Ran 1 test in 4.424s 2022-05-18T04:43:55.4871647Z 2022-05-18T04:43:55.4871748Z OK 2022-05-18T04:43:55.4871891Z 2022-05-18T04:43:55.4872028Z Generating XML reports... 2022-05-18T04:43:55.4913741Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044351.xml 2022-05-18T04:43:56.6969549Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:43:56.6983560Z 2022-05-18T04:43:56.6983826Z Running tests... 2022-05-18T04:43:56.6984474Z ---------------------------------------------------------------------- 2022-05-18T04:43:56.6992611Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:43:58.2968134Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:43:58.3366949Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61985 2022-05-18T04:43:58.3475440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61986 2022-05-18T04:43:59.2946695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:59.3076328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:00.6280080Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp06eyhbdh 2022-05-18T04:44:00.6280693Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp06eyhbdh/_remote_module_non_scriptable.py 2022-05-18T04:44:00.6312554Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8x1257xo 2022-05-18T04:44:00.6315398Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8x1257xo/_remote_module_non_scriptable.py 2022-05-18T04:44:00.8380314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:00.8380865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:00.8725965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:00.8726480Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:01.2557557Z ok (4.557s) 2022-05-18T04:44:01.2558005Z 2022-05-18T04:44:01.2558509Z ---------------------------------------------------------------------- 2022-05-18T04:44:01.2558859Z Ran 1 test in 4.557s 2022-05-18T04:44:01.2559038Z 2022-05-18T04:44:01.2559139Z OK 2022-05-18T04:44:01.2559281Z 2022-05-18T04:44:01.2559415Z Generating XML reports... 2022-05-18T04:44:01.2603006Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044356.xml 2022-05-18T04:44:02.4790089Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:02.4803646Z 2022-05-18T04:44:02.4803860Z Running tests... 2022-05-18T04:44:02.4804296Z ---------------------------------------------------------------------- 2022-05-18T04:44:02.4815744Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:44:04.0569790Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:04.0967924Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62100 2022-05-18T04:44:04.1078585Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62101 2022-05-18T04:44:05.0210397Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:05.0220475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:06.3383650Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5i3b_khj 2022-05-18T04:44:06.3384430Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5i3b_khj/_remote_module_non_scriptable.py 2022-05-18T04:44:06.3538148Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppsdbdu7g 2022-05-18T04:44:06.3540834Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppsdbdu7g/_remote_module_non_scriptable.py 2022-05-18T04:44:06.5505497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:06.5506076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:06.5755733Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:44:06.5757291Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:44:06.6128207Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:06.6128730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:07.0160962Z ok (4.535s) 2022-05-18T04:44:07.0161286Z 2022-05-18T04:44:07.0161831Z ---------------------------------------------------------------------- 2022-05-18T04:44:07.0162180Z Ran 1 test in 4.536s 2022-05-18T04:44:07.0162365Z 2022-05-18T04:44:07.0162460Z OK 2022-05-18T04:44:07.0162577Z 2022-05-18T04:44:07.0164755Z Generating XML reports... 2022-05-18T04:44:07.0205305Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044402.xml 2022-05-18T04:44:08.2231657Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:08.2245423Z 2022-05-18T04:44:08.2245731Z Running tests... 2022-05-18T04:44:08.2246169Z ---------------------------------------------------------------------- 2022-05-18T04:44:08.2257507Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:44:09.8250447Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:09.8649070Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62215 2022-05-18T04:44:09.8759231Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62216 2022-05-18T04:44:10.7722216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:10.8250919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:12.1470247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl1_8mesm 2022-05-18T04:44:12.1470866Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl1_8mesm/_remote_module_non_scriptable.py 2022-05-18T04:44:12.1628579Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn6386mlz 2022-05-18T04:44:12.1631159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn6386mlz/_remote_module_non_scriptable.py 2022-05-18T04:44:12.6840750Z ok (4.459s) 2022-05-18T04:44:12.6840951Z 2022-05-18T04:44:12.6841346Z ---------------------------------------------------------------------- 2022-05-18T04:44:12.6841684Z Ran 1 test in 4.460s 2022-05-18T04:44:12.6841848Z 2022-05-18T04:44:12.6842246Z OK 2022-05-18T04:44:12.6842385Z 2022-05-18T04:44:12.6842520Z Generating XML reports... 2022-05-18T04:44:12.6886293Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044408.xml 2022-05-18T04:44:13.8792356Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:13.8805615Z 2022-05-18T04:44:13.8805822Z Running tests... 2022-05-18T04:44:13.8806743Z ---------------------------------------------------------------------- 2022-05-18T04:44:13.8814441Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:44:15.4296648Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:15.4684496Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62330 2022-05-18T04:44:15.4792815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62331 2022-05-18T04:44:16.3810225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:16.4252639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:17.7496038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2htzpfyp 2022-05-18T04:44:17.7497048Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2htzpfyp/_remote_module_non_scriptable.py 2022-05-18T04:44:17.7504876Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpamzwwxn9 2022-05-18T04:44:17.7508218Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpamzwwxn9/_remote_module_non_scriptable.py 2022-05-18T04:44:17.9538455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:17.9538996Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:17.9848543Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:17.9849033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:18.2873056Z ok (4.406s) 2022-05-18T04:44:18.2873303Z 2022-05-18T04:44:18.2873720Z ---------------------------------------------------------------------- 2022-05-18T04:44:18.2874059Z Ran 1 test in 4.407s 2022-05-18T04:44:18.2874226Z 2022-05-18T04:44:18.2874328Z OK 2022-05-18T04:44:18.2874467Z 2022-05-18T04:44:18.2874587Z Generating XML reports... 2022-05-18T04:44:18.2918578Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044413.xml 2022-05-18T04:44:19.4729103Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:19.4742959Z 2022-05-18T04:44:19.4743104Z Running tests... 2022-05-18T04:44:19.4743546Z ---------------------------------------------------------------------- 2022-05-18T04:44:19.4755757Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:44:21.0600214Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:21.0996620Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62445 2022-05-18T04:44:21.1105604Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62446 2022-05-18T04:44:22.0128401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:22.0168422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:23.3080129Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_zhg1j96 2022-05-18T04:44:23.3080893Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_zhg1j96/_remote_module_non_scriptable.py 2022-05-18T04:44:23.3265466Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5i5rdi0b 2022-05-18T04:44:23.3268661Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5i5rdi0b/_remote_module_non_scriptable.py 2022-05-18T04:44:23.5123968Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:44:23.5175714Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:44:23.5456042Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:44:23.5456792Z warnings.warn( 2022-05-18T04:44:23.5457854Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:44:23.5458576Z warnings.warn( 2022-05-18T04:44:23.5567017Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:23.5567522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:23.6099774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:23.6100554Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:23.9185397Z ok (4.444s) 2022-05-18T04:44:23.9185761Z 2022-05-18T04:44:23.9186453Z ---------------------------------------------------------------------- 2022-05-18T04:44:23.9187080Z Ran 1 test in 4.444s 2022-05-18T04:44:23.9187345Z 2022-05-18T04:44:23.9187520Z OK 2022-05-18T04:44:23.9187770Z 2022-05-18T04:44:23.9188017Z Generating XML reports... 2022-05-18T04:44:23.9231948Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044419.xml 2022-05-18T04:44:25.0993988Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:25.1006946Z 2022-05-18T04:44:25.1007351Z Running tests... 2022-05-18T04:44:25.1007852Z ---------------------------------------------------------------------- 2022-05-18T04:44:25.1018605Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:44:26.6456505Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:26.6844451Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62560 2022-05-18T04:44:26.6952640Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62561 2022-05-18T04:44:27.5964279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:27.6297890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:28.9380795Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkc74do96 2022-05-18T04:44:28.9381416Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkc74do96/_remote_module_non_scriptable.py 2022-05-18T04:44:28.9444419Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnh7njhog 2022-05-18T04:44:28.9447314Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnh7njhog/_remote_module_non_scriptable.py 2022-05-18T04:44:29.1456241Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:44:29.1457062Z warnings.warn( 2022-05-18T04:44:29.1458122Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:44:29.1458835Z warnings.warn( 2022-05-18T04:44:29.1589608Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:29.1590100Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:29.2011095Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:29.2011585Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:29.5031526Z ok (4.402s) 2022-05-18T04:44:29.5031743Z 2022-05-18T04:44:29.5032150Z ---------------------------------------------------------------------- 2022-05-18T04:44:29.5032488Z Ran 1 test in 4.402s 2022-05-18T04:44:29.5032638Z 2022-05-18T04:44:29.5032734Z OK 2022-05-18T04:44:29.5034599Z 2022-05-18T04:44:29.5035012Z Generating XML reports... 2022-05-18T04:44:29.5075960Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044425.xml 2022-05-18T04:44:30.6927313Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:30.6940719Z 2022-05-18T04:44:30.6941330Z Running tests... 2022-05-18T04:44:30.6941824Z ---------------------------------------------------------------------- 2022-05-18T04:44:30.6954776Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:44:32.2818751Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:32.3215419Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62675 2022-05-18T04:44:32.3321785Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62676 2022-05-18T04:44:33.2283858Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:33.2606601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:34.5737221Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv7gs1wmf 2022-05-18T04:44:34.5738380Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv7gs1wmf/_remote_module_non_scriptable.py 2022-05-18T04:44:34.5909131Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjrp_0re 2022-05-18T04:44:34.5911362Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjrp_0re/_remote_module_non_scriptable.py 2022-05-18T04:44:34.7857266Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:34.7857809Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:34.8212384Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:34.8212882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:35.1414978Z ok (4.447s) 2022-05-18T04:44:35.1415205Z 2022-05-18T04:44:35.1415810Z ---------------------------------------------------------------------- 2022-05-18T04:44:35.1416153Z Ran 1 test in 4.447s 2022-05-18T04:44:35.1416319Z 2022-05-18T04:44:35.1416413Z OK 2022-05-18T04:44:35.1416557Z 2022-05-18T04:44:35.1416689Z Generating XML reports... 2022-05-18T04:44:35.1459790Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044430.xml 2022-05-18T04:44:36.3411456Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:36.3425085Z 2022-05-18T04:44:36.3425452Z Running tests... 2022-05-18T04:44:36.3425940Z ---------------------------------------------------------------------- 2022-05-18T04:44:36.3438983Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:44:37.9151294Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:37.9552445Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62790 2022-05-18T04:44:37.9660893Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62791 2022-05-18T04:44:38.8663055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:38.8753069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:40.1931046Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy0aqy0e6 2022-05-18T04:44:40.1932154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy0aqy0e6/_remote_module_non_scriptable.py 2022-05-18T04:44:40.1937045Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps3yyf4zz 2022-05-18T04:44:40.1940111Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps3yyf4zz/_remote_module_non_scriptable.py 2022-05-18T04:44:40.3895640Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.3896518Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.4210493Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.4210990Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.4428254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.4428760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.4738611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.4739109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:40.7740532Z ok (4.431s) 2022-05-18T04:44:40.7740737Z 2022-05-18T04:44:40.7741150Z ---------------------------------------------------------------------- 2022-05-18T04:44:40.7741494Z Ran 1 test in 4.431s 2022-05-18T04:44:40.7741667Z 2022-05-18T04:44:40.7741746Z OK 2022-05-18T04:44:40.7741973Z 2022-05-18T04:44:40.7742205Z Generating XML reports... 2022-05-18T04:44:40.7787221Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044436.xml 2022-05-18T04:44:41.9550071Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:41.9563592Z 2022-05-18T04:44:41.9563982Z Running tests... 2022-05-18T04:44:41.9564473Z ---------------------------------------------------------------------- 2022-05-18T04:44:41.9573032Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2022-05-18T04:44:43.5085728Z This unit test verifies whether the Future object is passed properly. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:43.5474354Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62905 2022-05-18T04:44:43.5580017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62906 2022-05-18T04:44:44.4675960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:44.5035684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:44.5248815Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz6juwkgq 2022-05-18T04:44:44.5251626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz6juwkgq/_remote_module_non_scriptable.py 2022-05-18T04:44:44.5252195Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpchlb_dhl 2022-05-18T04:44:44.5255227Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpchlb_dhl/_remote_module_non_scriptable.py 2022-05-18T04:44:44.7625517Z ok (2.806s) 2022-05-18T04:44:44.7625823Z 2022-05-18T04:44:44.7626232Z ---------------------------------------------------------------------- 2022-05-18T04:44:44.7626577Z Ran 1 test in 2.806s 2022-05-18T04:44:44.7626725Z 2022-05-18T04:44:44.7626831Z OK 2022-05-18T04:44:44.7626964Z 2022-05-18T04:44:44.7627111Z Generating XML reports... 2022-05-18T04:44:44.7670048Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044441.xml 2022-05-18T04:44:45.9401627Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:45.9415773Z 2022-05-18T04:44:45.9416179Z Running tests... 2022-05-18T04:44:45.9416687Z ---------------------------------------------------------------------- 2022-05-18T04:44:45.9425021Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2022-05-18T04:44:47.5054476Z This unit test verifies whether the Future object is passed properly using gloo backend. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:47.5443081Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63018 2022-05-18T04:44:47.5550898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63019 2022-05-18T04:44:48.4580113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:48.4919395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:49.8054874Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyjh_u5bh 2022-05-18T04:44:49.8055507Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyjh_u5bh/_remote_module_non_scriptable.py 2022-05-18T04:44:49.8066163Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5jwektmh 2022-05-18T04:44:49.8069007Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5jwektmh/_remote_module_non_scriptable.py 2022-05-18T04:44:50.1625840Z ok (4.221s) 2022-05-18T04:44:50.1626055Z 2022-05-18T04:44:50.1626848Z ---------------------------------------------------------------------- 2022-05-18T04:44:50.1627231Z Ran 1 test in 4.221s 2022-05-18T04:44:50.1627403Z 2022-05-18T04:44:50.1627502Z OK 2022-05-18T04:44:50.1627668Z 2022-05-18T04:44:50.1627788Z Generating XML reports... 2022-05-18T04:44:50.1671635Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044445.xml 2022-05-18T04:44:51.3697197Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:51.3711465Z 2022-05-18T04:44:51.3711930Z Running tests... 2022-05-18T04:44:51.3712356Z ---------------------------------------------------------------------- 2022-05-18T04:44:51.3722679Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2022-05-18T04:44:52.9503954Z DDP communication hook can only be registered once. This test validates whether ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:52.9898947Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63133 2022-05-18T04:44:53.0007610Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63134 2022-05-18T04:44:53.9027036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:53.9516509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:53.9746118Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvncuc6v1 2022-05-18T04:44:53.9747212Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7e9advtm 2022-05-18T04:44:53.9748510Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvncuc6v1/_remote_module_non_scriptable.py 2022-05-18T04:44:53.9750205Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7e9advtm/_remote_module_non_scriptable.py 2022-05-18T04:44:54.2050459Z ok (2.834s) 2022-05-18T04:44:54.2050642Z 2022-05-18T04:44:54.2051041Z ---------------------------------------------------------------------- 2022-05-18T04:44:54.2051390Z Ran 1 test in 2.834s 2022-05-18T04:44:54.2051555Z 2022-05-18T04:44:54.2051652Z OK 2022-05-18T04:44:54.2051768Z 2022-05-18T04:44:54.2051913Z Generating XML reports... 2022-05-18T04:44:54.2095439Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044451.xml 2022-05-18T04:44:55.3465856Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:55.3479901Z 2022-05-18T04:44:55.3480068Z Running tests... 2022-05-18T04:44:55.3480515Z ---------------------------------------------------------------------- 2022-05-18T04:44:55.3493540Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2022-05-18T04:44:56.9329257Z Runs "test_sparse_gradients" unit test with DDP communication hook. We define a ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:44:56.9723721Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63242 2022-05-18T04:44:56.9832114Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63243 2022-05-18T04:44:57.9374484Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:57.9387102Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:57.9694718Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp81qyzd08 2022-05-18T04:44:57.9697416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7a_jl2e_ 2022-05-18T04:44:57.9697981Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp81qyzd08/_remote_module_non_scriptable.py 2022-05-18T04:44:57.9700239Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7a_jl2e_/_remote_module_non_scriptable.py 2022-05-18T04:44:58.1876414Z ok (2.839s) 2022-05-18T04:44:58.1876622Z 2022-05-18T04:44:58.1877035Z ---------------------------------------------------------------------- 2022-05-18T04:44:58.1877398Z Ran 1 test in 2.840s 2022-05-18T04:44:58.1877547Z 2022-05-18T04:44:58.1877643Z OK 2022-05-18T04:44:58.1877781Z 2022-05-18T04:44:58.1877927Z Generating XML reports... 2022-05-18T04:44:58.1921583Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044455.xml 2022-05-18T04:44:59.3725183Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:44:59.3738887Z 2022-05-18T04:44:59.3739343Z Running tests... 2022-05-18T04:44:59.3740204Z ---------------------------------------------------------------------- 2022-05-18T04:44:59.3751172Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2022-05-18T04:45:00.9607352Z This unit test makes sure that register_comm_hook properly checks the format ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:01.0004888Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63385 2022-05-18T04:45:01.0113630Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63386 2022-05-18T04:45:01.8998234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:01.9519953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:01.9733385Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxhhf8o7a 2022-05-18T04:45:01.9733926Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8_9f4nz4 2022-05-18T04:45:01.9735845Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxhhf8o7a/_remote_module_non_scriptable.py 2022-05-18T04:45:01.9736403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8_9f4nz4/_remote_module_non_scriptable.py 2022-05-18T04:45:02.2158798Z ok (2.842s) 2022-05-18T04:45:02.2159005Z 2022-05-18T04:45:02.2159406Z ---------------------------------------------------------------------- 2022-05-18T04:45:02.2159735Z Ran 1 test in 2.842s 2022-05-18T04:45:02.2159900Z 2022-05-18T04:45:02.2160000Z OK 2022-05-18T04:45:02.2160136Z 2022-05-18T04:45:02.2160285Z Generating XML reports... 2022-05-18T04:45:02.2205417Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044459.xml 2022-05-18T04:45:03.4209193Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:03.4222506Z 2022-05-18T04:45:03.4222772Z Running tests... 2022-05-18T04:45:03.4223473Z ---------------------------------------------------------------------- 2022-05-18T04:45:03.4237311Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2022-05-18T04:45:05.0042448Z This test checks whether return annotation checked properly if defined. It also ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:05.0439077Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63494 2022-05-18T04:45:05.0548506Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63495 2022-05-18T04:45:05.9508621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:05.9576964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:05.9892264Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkqshdth3 2022-05-18T04:45:05.9892818Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpze07r4w9 2022-05-18T04:45:05.9894931Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkqshdth3/_remote_module_non_scriptable.py 2022-05-18T04:45:05.9895468Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpze07r4w9/_remote_module_non_scriptable.py 2022-05-18T04:45:06.1591185Z ok (2.737s) 2022-05-18T04:45:06.1591388Z 2022-05-18T04:45:06.1591783Z ---------------------------------------------------------------------- 2022-05-18T04:45:06.1592149Z Ran 1 test in 2.737s 2022-05-18T04:45:06.1592317Z 2022-05-18T04:45:06.1592393Z OK 2022-05-18T04:45:06.1592535Z 2022-05-18T04:45:06.1592692Z Generating XML reports... 2022-05-18T04:45:06.1637384Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044503.xml 2022-05-18T04:45:07.3586035Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:07.3600197Z 2022-05-18T04:45:07.3600620Z Running tests... 2022-05-18T04:45:07.3601465Z ---------------------------------------------------------------------- 2022-05-18T04:45:07.3620131Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2022-05-18T04:45:08.9488134Z An empty unused_parameters array does not imply find_unused_parameters = ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:08.9885675Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63607 2022-05-18T04:45:08.9994500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63608 2022-05-18T04:45:09.9135577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:09.9415059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:09.9654633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqe7fusam 2022-05-18T04:45:09.9656652Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqe7fusam/_remote_module_non_scriptable.py 2022-05-18T04:45:09.9659230Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2lhakhuh 2022-05-18T04:45:09.9662469Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2lhakhuh/_remote_module_non_scriptable.py 2022-05-18T04:45:09.9808383Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:45:11.6070057Z ok (4.247s) 2022-05-18T04:45:11.6070313Z 2022-05-18T04:45:11.6071051Z ---------------------------------------------------------------------- 2022-05-18T04:45:11.6071870Z Ran 1 test in 4.247s 2022-05-18T04:45:11.6072187Z 2022-05-18T04:45:11.6072350Z OK 2022-05-18T04:45:11.6072510Z 2022-05-18T04:45:11.6072650Z Generating XML reports... 2022-05-18T04:45:11.6114252Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044507.xml 2022-05-18T04:45:12.7958974Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:12.7972339Z 2022-05-18T04:45:12.7972785Z Running tests... 2022-05-18T04:45:12.7973671Z ---------------------------------------------------------------------- 2022-05-18T04:45:14.4033997Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:14.4421452Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63722 2022-05-18T04:45:14.4529715Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63723 2022-05-18T04:45:15.3576743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:15.3994235Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:15.4209938Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp1030ri9 2022-05-18T04:45:15.4210864Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9dr_kpq8 2022-05-18T04:45:15.4212602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp1030ri9/_remote_module_non_scriptable.py 2022-05-18T04:45:15.4213754Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9dr_kpq8/_remote_module_non_scriptable.py 2022-05-18T04:45:17.0604330Z ok (4.263s) 2022-05-18T04:45:17.0604685Z 2022-05-18T04:45:17.0605169Z ---------------------------------------------------------------------- 2022-05-18T04:45:17.0605530Z Ran 1 test in 4.263s 2022-05-18T04:45:17.0605698Z 2022-05-18T04:45:17.0605801Z OK 2022-05-18T04:45:17.0606287Z 2022-05-18T04:45:17.0606427Z Generating XML reports... 2022-05-18T04:45:17.0647489Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044512.xml 2022-05-18T04:45:18.2475619Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:18.2489370Z 2022-05-18T04:45:18.2489903Z Running tests... 2022-05-18T04:45:18.2490418Z ---------------------------------------------------------------------- 2022-05-18T04:45:19.8192009Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:19.8579653Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63837 2022-05-18T04:45:19.8688600Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63838 2022-05-18T04:45:20.8034765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:20.8215258Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:20.8451376Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9jlerrhb 2022-05-18T04:45:20.8454054Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjkwzrlvv 2022-05-18T04:45:20.8454634Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9jlerrhb/_remote_module_non_scriptable.py 2022-05-18T04:45:20.8456728Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjkwzrlvv/_remote_module_non_scriptable.py 2022-05-18T04:45:22.4770643Z ok (4.228s) 2022-05-18T04:45:22.4770868Z 2022-05-18T04:45:22.4771458Z ---------------------------------------------------------------------- 2022-05-18T04:45:22.4771817Z Ran 1 test in 4.228s 2022-05-18T04:45:22.4771984Z 2022-05-18T04:45:22.4772086Z OK 2022-05-18T04:45:22.4772223Z 2022-05-18T04:45:22.4772359Z Generating XML reports... 2022-05-18T04:45:22.4813851Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044518.xml 2022-05-18T04:45:23.6690722Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:23.6704552Z 2022-05-18T04:45:23.6704786Z Running tests... 2022-05-18T04:45:23.6705686Z ---------------------------------------------------------------------- 2022-05-18T04:45:25.2460720Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:25.2855479Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63952 2022-05-18T04:45:25.2964524Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63953 2022-05-18T04:45:26.2062511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:26.2345784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:26.2560178Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkz4yco0w 2022-05-18T04:45:26.2563259Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkz4yco0w/_remote_module_non_scriptable.py 2022-05-18T04:45:26.2563848Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp51i3wt28 2022-05-18T04:45:26.2566404Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp51i3wt28/_remote_module_non_scriptable.py 2022-05-18T04:45:26.2709837Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:45:26.2710604Z warnings.warn( 2022-05-18T04:45:26.2711676Z /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py:1736: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:45:26.2712648Z warnings.warn( 2022-05-18T04:45:27.9039434Z ok (4.233s) 2022-05-18T04:45:27.9041125Z 2022-05-18T04:45:27.9041713Z ---------------------------------------------------------------------- 2022-05-18T04:45:27.9042456Z Ran 1 test in 4.233s 2022-05-18T04:45:27.9042830Z 2022-05-18T04:45:27.9042963Z OK 2022-05-18T04:45:27.9043103Z 2022-05-18T04:45:27.9043238Z Generating XML reports... 2022-05-18T04:45:27.9083377Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044523.xml 2022-05-18T04:45:29.0945376Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:29.0959442Z 2022-05-18T04:45:29.0959734Z Running tests... 2022-05-18T04:45:29.0960193Z ---------------------------------------------------------------------- 2022-05-18T04:45:30.6838210Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:30.7235840Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64067 2022-05-18T04:45:30.7344789Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64068 2022-05-18T04:45:31.6587440Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:31.6621553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:32.9618766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6pb2ard4 2022-05-18T04:45:32.9619375Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6pb2ard4/_remote_module_non_scriptable.py 2022-05-18T04:45:32.9635680Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7el2p7fx 2022-05-18T04:45:32.9638600Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7el2p7fx/_remote_module_non_scriptable.py 2022-05-18T04:45:33.1559859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:33.1560404Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:33.4424558Z ok (4.346s) 2022-05-18T04:45:33.4424820Z 2022-05-18T04:45:33.4425555Z ---------------------------------------------------------------------- 2022-05-18T04:45:33.4425919Z Ran 1 test in 4.346s 2022-05-18T04:45:33.4426066Z 2022-05-18T04:45:33.4426165Z OK 2022-05-18T04:45:33.4427011Z 2022-05-18T04:45:33.4427165Z Generating XML reports... 2022-05-18T04:45:33.4468318Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044529.xml 2022-05-18T04:45:34.6417726Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:34.6431894Z 2022-05-18T04:45:34.6432431Z Running tests... 2022-05-18T04:45:34.6432917Z ---------------------------------------------------------------------- 2022-05-18T04:45:36.2258316Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:36.2646456Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64182 2022-05-18T04:45:36.2754864Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64183 2022-05-18T04:45:37.2027256Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:37.2073494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:38.5351237Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo1pv3485 2022-05-18T04:45:38.5352537Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo1pv3485/_remote_module_non_scriptable.py 2022-05-18T04:45:38.5384910Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4k5nf4fj 2022-05-18T04:45:38.5387973Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4k5nf4fj/_remote_module_non_scriptable.py 2022-05-18T04:45:38.7305983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:38.7306536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:39.0834614Z ok (4.440s) 2022-05-18T04:45:39.0834827Z 2022-05-18T04:45:39.0835224Z ---------------------------------------------------------------------- 2022-05-18T04:45:39.0835549Z Ran 1 test in 4.440s 2022-05-18T04:45:39.0835713Z 2022-05-18T04:45:39.0835810Z OK 2022-05-18T04:45:39.0835947Z 2022-05-18T04:45:39.0836099Z Generating XML reports... 2022-05-18T04:45:39.0878042Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044534.xml 2022-05-18T04:45:40.2761410Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:40.2775235Z 2022-05-18T04:45:40.2775652Z Running tests... 2022-05-18T04:45:40.2776163Z ---------------------------------------------------------------------- 2022-05-18T04:45:41.8780622Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:41.9173328Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64297 2022-05-18T04:45:41.9281771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64298 2022-05-18T04:45:42.8310725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:42.8747194Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:43.0325509Z skip: Need at least 4 CUDA devices (2.755s) 2022-05-18T04:45:43.0325775Z 2022-05-18T04:45:43.0326153Z ---------------------------------------------------------------------- 2022-05-18T04:45:43.0326491Z Ran 1 test in 2.755s 2022-05-18T04:45:43.0326656Z 2022-05-18T04:45:43.0326769Z OK (skipped=1) 2022-05-18T04:45:43.0326924Z 2022-05-18T04:45:43.0327051Z Generating XML reports... 2022-05-18T04:45:43.0369356Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044540.xml 2022-05-18T04:45:44.1962392Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:44.1976116Z 2022-05-18T04:45:44.1976521Z Running tests... 2022-05-18T04:45:44.1977014Z ---------------------------------------------------------------------- 2022-05-18T04:45:45.7388343Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:45.7776479Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64400 2022-05-18T04:45:45.7881394Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64401 2022-05-18T04:45:46.7189858Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:46.7406922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:46.8923229Z skip: Need at least 8 CUDA devices (2.694s) 2022-05-18T04:45:46.8923470Z 2022-05-18T04:45:46.8924112Z ---------------------------------------------------------------------- 2022-05-18T04:45:46.8924449Z Ran 1 test in 2.695s 2022-05-18T04:45:46.8924622Z 2022-05-18T04:45:46.8924718Z OK (skipped=1) 2022-05-18T04:45:46.8924878Z 2022-05-18T04:45:46.8925002Z Generating XML reports... 2022-05-18T04:45:46.8968067Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044544.xml 2022-05-18T04:45:48.0687378Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:48.0701066Z 2022-05-18T04:45:48.0701365Z Running tests... 2022-05-18T04:45:48.0702010Z ---------------------------------------------------------------------- 2022-05-18T04:45:49.6683377Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:49.7081332Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64503 2022-05-18T04:45:49.7190277Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64504 2022-05-18T04:45:50.6177869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:50.6284535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:50.6601438Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_jun2i87 2022-05-18T04:45:50.6605711Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbrxn6wrj 2022-05-18T04:45:50.6606264Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_jun2i87/_remote_module_non_scriptable.py 2022-05-18T04:45:50.6606813Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbrxn6wrj/_remote_module_non_scriptable.py 2022-05-18T04:45:50.6801887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:50.6802391Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:50.9234784Z ok (2.853s) 2022-05-18T04:45:50.9234998Z 2022-05-18T04:45:50.9235380Z ---------------------------------------------------------------------- 2022-05-18T04:45:50.9235717Z Ran 1 test in 2.853s 2022-05-18T04:45:50.9235882Z 2022-05-18T04:45:50.9235978Z OK 2022-05-18T04:45:50.9236120Z 2022-05-18T04:45:50.9236255Z Generating XML reports... 2022-05-18T04:45:50.9278605Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044548.xml 2022-05-18T04:45:52.1049959Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:52.1063125Z 2022-05-18T04:45:52.1063438Z Running tests... 2022-05-18T04:45:52.1064108Z ---------------------------------------------------------------------- 2022-05-18T04:45:53.7070760Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:53.7458900Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64616 2022-05-18T04:45:53.7569879Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64617 2022-05-18T04:45:54.6650600Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:54.7004124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:54.7225753Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp44g0n5br 2022-05-18T04:45:54.7228569Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp44g0n5br/_remote_module_non_scriptable.py 2022-05-18T04:45:54.7229350Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5o6xjf7w 2022-05-18T04:45:54.7231358Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5o6xjf7w/_remote_module_non_scriptable.py 2022-05-18T04:45:54.7426507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.7427222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.9611426Z ok (2.854s) 2022-05-18T04:45:54.9611636Z 2022-05-18T04:45:54.9612035Z ---------------------------------------------------------------------- 2022-05-18T04:45:54.9612458Z Ran 1 test in 2.855s 2022-05-18T04:45:54.9612727Z 2022-05-18T04:45:54.9613202Z OK 2022-05-18T04:45:54.9613346Z 2022-05-18T04:45:54.9613477Z Generating XML reports... 2022-05-18T04:45:54.9655671Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044552.xml 2022-05-18T04:45:56.1242137Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:45:56.1255838Z 2022-05-18T04:45:56.1256091Z Running tests... 2022-05-18T04:45:56.1256510Z ---------------------------------------------------------------------- 2022-05-18T04:45:56.1274699Z test_ignored_output (__main__.DistributedDataParallelTest) 2022-05-18T04:45:57.7103239Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:45:57.7496492Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64729 2022-05-18T04:45:57.7605342Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64730 2022-05-18T04:45:58.6617195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:58.6728603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:58.7044911Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4egktedu 2022-05-18T04:45:58.7047781Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4egktedu/_remote_module_non_scriptable.py 2022-05-18T04:45:58.7048346Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0zjgo9w8 2022-05-18T04:45:58.7051141Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0zjgo9w8/_remote_module_non_scriptable.py 2022-05-18T04:45:58.7267201Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:58.7268160Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:58.9649226Z ok (2.839s) 2022-05-18T04:45:58.9649479Z 2022-05-18T04:45:58.9649897Z ---------------------------------------------------------------------- 2022-05-18T04:45:58.9650246Z Ran 1 test in 2.839s 2022-05-18T04:45:58.9650395Z 2022-05-18T04:45:58.9650491Z OK 2022-05-18T04:45:58.9651512Z 2022-05-18T04:45:58.9653518Z Generating XML reports... 2022-05-18T04:45:58.9693003Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044556.xml 2022-05-18T04:46:00.0907680Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:00.0922567Z 2022-05-18T04:46:00.0923134Z Running tests... 2022-05-18T04:46:00.0923610Z ---------------------------------------------------------------------- 2022-05-18T04:46:00.0943604Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2022-05-18T04:46:01.6760418Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:01.7163644Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64872 2022-05-18T04:46:01.7274017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64873 2022-05-18T04:46:02.6701943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:02.6845030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:02.7121335Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjcfbzib3 2022-05-18T04:46:02.7123824Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvdfx3ps9 2022-05-18T04:46:02.7124366Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjcfbzib3/_remote_module_non_scriptable.py 2022-05-18T04:46:02.7126535Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvdfx3ps9/_remote_module_non_scriptable.py 2022-05-18T04:46:02.9318203Z ok (2.839s) 2022-05-18T04:46:02.9318496Z 2022-05-18T04:46:02.9319374Z ---------------------------------------------------------------------- 2022-05-18T04:46:02.9319705Z Ran 1 test in 2.840s 2022-05-18T04:46:02.9319872Z 2022-05-18T04:46:02.9319973Z OK 2022-05-18T04:46:02.9320113Z 2022-05-18T04:46:02.9320252Z Generating XML reports... 2022-05-18T04:46:02.9362472Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044600.xml 2022-05-18T04:46:04.1164056Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:04.1177869Z 2022-05-18T04:46:04.1178304Z Running tests... 2022-05-18T04:46:04.1178836Z ---------------------------------------------------------------------- 2022-05-18T04:46:05.7072002Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:05.7474262Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65015 2022-05-18T04:46:05.7585107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65016 2022-05-18T04:46:06.6725915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:06.6731553Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.6732784Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.6733857Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.6734930Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.6736291Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.6737381Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.7050942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:06.7057629Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.7058719Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.7060017Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.7061087Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.7062128Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.7063190Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:46:06.8626717Z ok (2.745s) 2022-05-18T04:46:06.8626951Z 2022-05-18T04:46:06.8627527Z ---------------------------------------------------------------------- 2022-05-18T04:46:06.8627945Z Ran 1 test in 2.745s 2022-05-18T04:46:06.8628114Z 2022-05-18T04:46:06.8628190Z OK 2022-05-18T04:46:06.8628328Z 2022-05-18T04:46:06.8628461Z Generating XML reports... 2022-05-18T04:46:06.8670793Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044604.xml 2022-05-18T04:46:08.0404832Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:08.0418294Z 2022-05-18T04:46:08.0418726Z Running tests... 2022-05-18T04:46:08.0419232Z ---------------------------------------------------------------------- 2022-05-18T04:46:09.6314375Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:09.6712796Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65118 2022-05-18T04:46:09.6820640Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65119 2022-05-18T04:46:10.6538191Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:10.6582547Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:10.6748675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:10.6749421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:10.6750235Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:10.6750922Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:11.9694668Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp33xub_ix 2022-05-18T04:46:11.9695858Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp33xub_ix/_remote_module_non_scriptable.py 2022-05-18T04:46:11.9732029Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9_pr9u42 2022-05-18T04:46:11.9735302Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9_pr9u42/_remote_module_non_scriptable.py 2022-05-18T04:46:12.1626004Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.1627030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.1763693Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.1764644Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.4901584Z ok (4.448s) 2022-05-18T04:46:12.4901959Z 2022-05-18T04:46:12.4902775Z ---------------------------------------------------------------------- 2022-05-18T04:46:12.4903371Z Ran 1 test in 4.448s 2022-05-18T04:46:12.4903525Z 2022-05-18T04:46:12.4903829Z OK 2022-05-18T04:46:12.4903974Z 2022-05-18T04:46:12.4904110Z Generating XML reports... 2022-05-18T04:46:12.4946006Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044608.xml 2022-05-18T04:46:13.6712214Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:13.6725239Z 2022-05-18T04:46:13.6725593Z Running tests... 2022-05-18T04:46:13.6726509Z ---------------------------------------------------------------------- 2022-05-18T04:46:15.2591139Z test_sparse_gradients (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:15.2985893Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65233 2022-05-18T04:46:15.3093239Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65234 2022-05-18T04:46:16.2278044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:16.2546510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:16.2802586Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzygvze33 2022-05-18T04:46:16.2805309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzygvze33/_remote_module_non_scriptable.py 2022-05-18T04:46:16.2806301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptqe6r_fp 2022-05-18T04:46:16.2808548Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptqe6r_fp/_remote_module_non_scriptable.py 2022-05-18T04:46:16.5136062Z ok (2.841s) 2022-05-18T04:46:16.5136461Z 2022-05-18T04:46:16.5137235Z ---------------------------------------------------------------------- 2022-05-18T04:46:16.5137651Z Ran 1 test in 2.841s 2022-05-18T04:46:16.5137802Z 2022-05-18T04:46:16.5138223Z OK 2022-05-18T04:46:16.5138379Z 2022-05-18T04:46:16.5138515Z Generating XML reports... 2022-05-18T04:46:16.5180030Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044613.xml 2022-05-18T04:46:17.6685728Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:17.6698175Z 2022-05-18T04:46:17.6698569Z Running tests... 2022-05-18T04:46:17.6699545Z ---------------------------------------------------------------------- 2022-05-18T04:46:19.2202074Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:19.2587770Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65376 2022-05-18T04:46:19.2694056Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65377 2022-05-18T04:46:20.2034373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:20.2035263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:20.2255971Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprif8hjjf 2022-05-18T04:46:20.2256988Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqr43ygmz 2022-05-18T04:46:20.2258333Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprif8hjjf/_remote_module_non_scriptable.py 2022-05-18T04:46:20.2259391Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqr43ygmz/_remote_module_non_scriptable.py 2022-05-18T04:46:20.4736013Z ok (2.803s) 2022-05-18T04:46:20.4736245Z 2022-05-18T04:46:20.4736654Z ---------------------------------------------------------------------- 2022-05-18T04:46:20.4736982Z Ran 1 test in 2.804s 2022-05-18T04:46:20.4737147Z 2022-05-18T04:46:20.4737244Z OK 2022-05-18T04:46:20.4737379Z 2022-05-18T04:46:20.4737513Z Generating XML reports... 2022-05-18T04:46:20.4781589Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044617.xml 2022-05-18T04:46:21.6608640Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:21.6622398Z 2022-05-18T04:46:21.6622767Z Running tests... 2022-05-18T04:46:21.6623233Z ---------------------------------------------------------------------- 2022-05-18T04:46:23.2470731Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:23.2861874Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65519 2022-05-18T04:46:23.2970146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65520 2022-05-18T04:46:24.1967860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:24.2039668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:25.5143308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpva21832s 2022-05-18T04:46:25.5144309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpva21832s/_remote_module_non_scriptable.py 2022-05-18T04:46:25.5428428Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk4zrqq3z 2022-05-18T04:46:25.5431174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk4zrqq3z/_remote_module_non_scriptable.py 2022-05-18T04:46:26.2644613Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:26.2645133Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:26.6060745Z ok (4.943s) 2022-05-18T04:46:26.6061095Z 2022-05-18T04:46:26.6061627Z ---------------------------------------------------------------------- 2022-05-18T04:46:26.6061983Z Ran 1 test in 4.944s 2022-05-18T04:46:26.6062150Z 2022-05-18T04:46:26.6062562Z OK 2022-05-18T04:46:26.6062703Z 2022-05-18T04:46:26.6062840Z Generating XML reports... 2022-05-18T04:46:26.6105611Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044621.xml 2022-05-18T04:46:27.8041088Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:27.8054583Z 2022-05-18T04:46:27.8055025Z Running tests... 2022-05-18T04:46:27.8055620Z ---------------------------------------------------------------------- 2022-05-18T04:46:29.3794765Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:29.4192858Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65634 2022-05-18T04:46:29.4301034Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65635 2022-05-18T04:46:30.3472629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:30.3808261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:31.6746324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaq2uadrr 2022-05-18T04:46:31.6747207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaq2uadrr/_remote_module_non_scriptable.py 2022-05-18T04:46:31.6968494Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpent25yxa 2022-05-18T04:46:31.6971226Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpent25yxa/_remote_module_non_scriptable.py 2022-05-18T04:46:32.2724320Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:32.2724893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:32.6390444Z ok (4.833s) 2022-05-18T04:46:32.6390747Z 2022-05-18T04:46:32.6391355Z ---------------------------------------------------------------------- 2022-05-18T04:46:32.6391732Z Ran 1 test in 4.834s 2022-05-18T04:46:32.6391901Z 2022-05-18T04:46:32.6391997Z OK 2022-05-18T04:46:32.6392137Z 2022-05-18T04:46:32.6392253Z Generating XML reports... 2022-05-18T04:46:32.6434093Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044627.xml 2022-05-18T04:46:33.8065512Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:33.8078964Z 2022-05-18T04:46:33.8079111Z Running tests... 2022-05-18T04:46:33.8079568Z ---------------------------------------------------------------------- 2022-05-18T04:46:35.3774830Z test_allgather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:35.4167783Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65749 2022-05-18T04:46:35.4276286Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65750 2022-05-18T04:46:35.4387010Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 65751 2022-05-18T04:46:35.4497217Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 65752 2022-05-18T04:46:36.3925343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:36.4480326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:46:36.4826275Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:46:36.4830213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:36.7547250Z ok (2.946s) 2022-05-18T04:46:36.7547628Z 2022-05-18T04:46:36.7548081Z ---------------------------------------------------------------------- 2022-05-18T04:46:36.7548483Z Ran 1 test in 2.947s 2022-05-18T04:46:36.7548653Z 2022-05-18T04:46:36.7548752Z OK 2022-05-18T04:46:36.7548890Z 2022-05-18T04:46:36.7549344Z Generating XML reports... 2022-05-18T04:46:36.7591773Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044633.xml 2022-05-18T04:46:37.9252719Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:37.9275898Z 2022-05-18T04:46:37.9276198Z Running tests... 2022-05-18T04:46:37.9276848Z ---------------------------------------------------------------------- 2022-05-18T04:46:39.5026506Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:39.5422096Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65932 2022-05-18T04:46:39.5529490Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65933 2022-05-18T04:46:39.5640349Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 65934 2022-05-18T04:46:39.5750055Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 65935 2022-05-18T04:46:40.5613269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:40.5667190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:40.5815554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:46:40.5842136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:46:42.5841343Z ok (4.657s) 2022-05-18T04:46:42.5841726Z 2022-05-18T04:46:42.5842384Z ---------------------------------------------------------------------- 2022-05-18T04:46:42.5843029Z Ran 1 test in 4.657s 2022-05-18T04:46:42.5843347Z 2022-05-18T04:46:42.5843525Z OK 2022-05-18T04:46:42.5843776Z 2022-05-18T04:46:42.5843995Z Generating XML reports... 2022-05-18T04:46:42.5888032Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044637.xml 2022-05-18T04:46:43.7938838Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:43.7952931Z 2022-05-18T04:46:43.7953063Z Running tests... 2022-05-18T04:46:43.7953802Z ---------------------------------------------------------------------- 2022-05-18T04:46:45.3757688Z test_allgather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:45.4155689Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66119 2022-05-18T04:46:45.4264410Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66120 2022-05-18T04:46:45.4374216Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 66121 2022-05-18T04:46:45.4483977Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 66122 2022-05-18T04:46:46.4135295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:46:46.4356278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:46.4622706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:46.4761240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:46:46.7534368Z ok (2.958s) 2022-05-18T04:46:46.7534584Z 2022-05-18T04:46:46.7535002Z ---------------------------------------------------------------------- 2022-05-18T04:46:46.7535339Z Ran 1 test in 2.958s 2022-05-18T04:46:46.7535508Z 2022-05-18T04:46:46.7535581Z OK 2022-05-18T04:46:46.7535721Z 2022-05-18T04:46:46.7535853Z Generating XML reports... 2022-05-18T04:46:46.7578678Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044643.xml 2022-05-18T04:46:47.9489441Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:47.9505319Z 2022-05-18T04:46:47.9505828Z Running tests... 2022-05-18T04:46:47.9506694Z ---------------------------------------------------------------------- 2022-05-18T04:46:49.5181563Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:49.5569593Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66302 2022-05-18T04:46:49.5676569Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66303 2022-05-18T04:46:49.5784814Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 66304 2022-05-18T04:46:49.5893208Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 66305 2022-05-18T04:46:50.5045091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:46:50.5402917Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:50.5427644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:46:50.5464328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:50.5741184Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:50.5842644Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:50.5946075Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T04:46:50.5946633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T04:46:50.5947539Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:46:50.5948394Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:46:50.5949102Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:46:50.6047802Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:46:50.8941805Z ok (2.943s) 2022-05-18T04:46:50.8942044Z 2022-05-18T04:46:50.8942674Z ---------------------------------------------------------------------- 2022-05-18T04:46:50.8943052Z Ran 1 test in 2.944s 2022-05-18T04:46:50.8943227Z 2022-05-18T04:46:50.8943304Z OK 2022-05-18T04:46:50.8943439Z 2022-05-18T04:46:50.8943765Z Generating XML reports... 2022-05-18T04:46:50.8986966Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044647.xml 2022-05-18T04:46:52.0684297Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:52.0699016Z 2022-05-18T04:46:52.0699160Z Running tests... 2022-05-18T04:46:52.0699910Z ---------------------------------------------------------------------- 2022-05-18T04:46:53.6496548Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:53.6894292Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66485 2022-05-18T04:46:53.7003858Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66486 2022-05-18T04:46:53.7113422Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 66487 2022-05-18T04:46:53.7223353Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 66488 2022-05-18T04:46:54.6845054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:54.7091675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:46:54.7167563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:54.7517953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:46:55.0274664Z ok (2.957s) 2022-05-18T04:46:55.0275084Z 2022-05-18T04:46:55.0275862Z ---------------------------------------------------------------------- 2022-05-18T04:46:55.0276272Z Ran 1 test in 2.958s 2022-05-18T04:46:55.0276421Z 2022-05-18T04:46:55.0276517Z OK 2022-05-18T04:46:55.0276654Z 2022-05-18T04:46:55.0276792Z Generating XML reports... 2022-05-18T04:46:55.0318814Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044652.xml 2022-05-18T04:46:56.2070301Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:46:56.2084412Z 2022-05-18T04:46:56.2084927Z Running tests... 2022-05-18T04:46:56.2085447Z ---------------------------------------------------------------------- 2022-05-18T04:46:57.7791614Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:46:57.8180293Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66668 2022-05-18T04:46:57.8287630Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66669 2022-05-18T04:46:57.8397567Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 66670 2022-05-18T04:46:57.8507402Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 66671 2022-05-18T04:46:58.7892753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:58.8010833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:58.8132319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:46:58.8293322Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:46:59.0555248Z ok (2.847s) 2022-05-18T04:46:59.0555453Z 2022-05-18T04:46:59.0555897Z ---------------------------------------------------------------------- 2022-05-18T04:46:59.0556225Z Ran 1 test in 2.847s 2022-05-18T04:46:59.0556393Z 2022-05-18T04:46:59.0556494Z OK 2022-05-18T04:46:59.0556637Z 2022-05-18T04:46:59.0556779Z Generating XML reports... 2022-05-18T04:46:59.0601731Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044656.xml 2022-05-18T04:47:00.2234340Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:00.2247606Z 2022-05-18T04:47:00.2247847Z Running tests... 2022-05-18T04:47:00.2248443Z ---------------------------------------------------------------------- 2022-05-18T04:47:01.7768954Z test_allgather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:01.8168082Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66851 2022-05-18T04:47:01.8279218Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66852 2022-05-18T04:47:01.8393071Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 66853 2022-05-18T04:47:01.8507238Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 66854 2022-05-18T04:47:02.7496670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:02.7780726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:02.8129331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:02.8283643Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:03.6569416Z ok (3.432s) 2022-05-18T04:47:03.6569630Z 2022-05-18T04:47:03.6570035Z ---------------------------------------------------------------------- 2022-05-18T04:47:03.6570385Z Ran 1 test in 3.432s 2022-05-18T04:47:03.6570549Z 2022-05-18T04:47:03.6570637Z OK 2022-05-18T04:47:03.6570771Z 2022-05-18T04:47:03.6571242Z Generating XML reports... 2022-05-18T04:47:03.6615107Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044700.xml 2022-05-18T04:47:04.8699545Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:04.8714237Z 2022-05-18T04:47:04.8714475Z Running tests... 2022-05-18T04:47:04.8714943Z ---------------------------------------------------------------------- 2022-05-18T04:47:06.4435376Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:06.4834875Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67058 2022-05-18T04:47:06.4942355Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67059 2022-05-18T04:47:06.5052929Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 67060 2022-05-18T04:47:06.5166373Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 67061 2022-05-18T04:47:07.4217732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:07.4275189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:07.4742250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:07.4788673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:11.2293180Z ok (6.358s) 2022-05-18T04:47:11.2293401Z 2022-05-18T04:47:11.2293807Z ---------------------------------------------------------------------- 2022-05-18T04:47:11.2294129Z Ran 1 test in 6.358s 2022-05-18T04:47:11.2294299Z 2022-05-18T04:47:11.2294395Z OK 2022-05-18T04:47:11.2294535Z 2022-05-18T04:47:11.2294669Z Generating XML reports... 2022-05-18T04:47:11.2338528Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044704.xml 2022-05-18T04:47:12.4407123Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:12.4420747Z 2022-05-18T04:47:12.4421102Z Running tests... 2022-05-18T04:47:12.4421793Z ---------------------------------------------------------------------- 2022-05-18T04:47:14.0023571Z test_allreduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:14.0422264Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67269 2022-05-18T04:47:14.0530778Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67270 2022-05-18T04:47:14.0639325Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 67271 2022-05-18T04:47:14.0750686Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 67272 2022-05-18T04:47:15.0470935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:15.0797037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:15.1000658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:15.1211843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:15.3801191Z ok (2.938s) 2022-05-18T04:47:15.3801408Z 2022-05-18T04:47:15.3801828Z ---------------------------------------------------------------------- 2022-05-18T04:47:15.3802174Z Ran 1 test in 2.938s 2022-05-18T04:47:15.3802351Z 2022-05-18T04:47:15.3802446Z OK 2022-05-18T04:47:15.3802584Z 2022-05-18T04:47:15.3802720Z Generating XML reports... 2022-05-18T04:47:15.3844849Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044712.xml 2022-05-18T04:47:16.5591502Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:16.5605984Z 2022-05-18T04:47:16.5606463Z Running tests... 2022-05-18T04:47:16.5607272Z ---------------------------------------------------------------------- 2022-05-18T04:47:18.1395896Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:18.1796862Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67452 2022-05-18T04:47:18.1907404Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67453 2022-05-18T04:47:18.2017874Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 67454 2022-05-18T04:47:18.2128888Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 67455 2022-05-18T04:47:19.1273057Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:19.1367838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:19.1423188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:19.2079083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:21.0215358Z ok (4.461s) 2022-05-18T04:47:21.0215579Z 2022-05-18T04:47:21.0216004Z ---------------------------------------------------------------------- 2022-05-18T04:47:21.0216352Z Ran 1 test in 4.461s 2022-05-18T04:47:21.0216515Z 2022-05-18T04:47:21.0216597Z OK 2022-05-18T04:47:21.0217073Z 2022-05-18T04:47:21.0217208Z Generating XML reports... 2022-05-18T04:47:21.0260411Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044716.xml 2022-05-18T04:47:22.2209049Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:22.2223008Z 2022-05-18T04:47:22.2223445Z Running tests... 2022-05-18T04:47:22.2223934Z ---------------------------------------------------------------------- 2022-05-18T04:47:23.8110441Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:23.8508729Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67639 2022-05-18T04:47:23.8617602Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67640 2022-05-18T04:47:23.8729625Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 67641 2022-05-18T04:47:23.8840298Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 67642 2022-05-18T04:47:24.7670422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:24.8026930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:24.8104561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:24.8362358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:26.6926428Z ok (4.470s) 2022-05-18T04:47:26.6926647Z 2022-05-18T04:47:26.6927078Z ---------------------------------------------------------------------- 2022-05-18T04:47:26.6927432Z Ran 1 test in 4.470s 2022-05-18T04:47:26.6927597Z 2022-05-18T04:47:26.6927676Z OK 2022-05-18T04:47:26.6927815Z 2022-05-18T04:47:26.6927955Z Generating XML reports... 2022-05-18T04:47:26.6971939Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044722.xml 2022-05-18T04:47:27.9075037Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:27.9088930Z 2022-05-18T04:47:27.9089339Z Running tests... 2022-05-18T04:47:27.9089855Z ---------------------------------------------------------------------- 2022-05-18T04:47:29.5061360Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:29.5465914Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67826 2022-05-18T04:47:29.5576768Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67827 2022-05-18T04:47:29.5688008Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 67828 2022-05-18T04:47:29.5803161Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 67829 2022-05-18T04:47:30.5058735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:30.5119758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:30.5511961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:30.5849436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:30.8853190Z ok (2.976s) 2022-05-18T04:47:30.8853417Z 2022-05-18T04:47:30.8853818Z ---------------------------------------------------------------------- 2022-05-18T04:47:30.8854147Z Ran 1 test in 2.976s 2022-05-18T04:47:30.8854316Z 2022-05-18T04:47:30.8854442Z OK 2022-05-18T04:47:30.8854578Z 2022-05-18T04:47:30.8854715Z Generating XML reports... 2022-05-18T04:47:30.8898077Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044727.xml 2022-05-18T04:47:32.0778681Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:32.0792334Z 2022-05-18T04:47:32.0792550Z Running tests... 2022-05-18T04:47:32.0792997Z ---------------------------------------------------------------------- 2022-05-18T04:47:33.6679645Z test_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:33.7077608Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68009 2022-05-18T04:47:33.7186577Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68010 2022-05-18T04:47:33.7295316Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 68011 2022-05-18T04:47:33.7405896Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 68012 2022-05-18T04:47:34.7206548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:34.7355902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:34.7700532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:34.7784456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:35.0456367Z ok (2.966s) 2022-05-18T04:47:35.0456593Z 2022-05-18T04:47:35.0456995Z ---------------------------------------------------------------------- 2022-05-18T04:47:35.0457317Z Ran 1 test in 2.966s 2022-05-18T04:47:35.0457484Z 2022-05-18T04:47:35.0457578Z OK 2022-05-18T04:47:35.0457713Z 2022-05-18T04:47:35.0457853Z Generating XML reports... 2022-05-18T04:47:35.0500511Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044732.xml 2022-05-18T04:47:36.2494277Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:36.2508458Z 2022-05-18T04:47:36.2508738Z Running tests... 2022-05-18T04:47:36.2509169Z ---------------------------------------------------------------------- 2022-05-18T04:47:37.8424726Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:37.8816602Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68192 2022-05-18T04:47:37.8924576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68193 2022-05-18T04:47:37.9034849Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 68194 2022-05-18T04:47:37.9143501Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 68195 2022-05-18T04:47:38.8958047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:38.9076155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:38.9297770Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:38.9689830Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:38.9799090Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:38.9815079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T04:47:38.9816511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:38.9817160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T04:47:38.9817974Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:47:38.9818677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:47:38.9819368Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:47:38.9902092Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T04:47:39.2193899Z ok (2.968s) 2022-05-18T04:47:39.2194108Z 2022-05-18T04:47:39.2194496Z ---------------------------------------------------------------------- 2022-05-18T04:47:39.2194818Z Ran 1 test in 2.969s 2022-05-18T04:47:39.2194985Z 2022-05-18T04:47:39.2195078Z OK 2022-05-18T04:47:39.2195212Z 2022-05-18T04:47:39.2197586Z Generating XML reports... 2022-05-18T04:47:39.2239188Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044736.xml 2022-05-18T04:47:40.4242531Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:40.4257215Z 2022-05-18T04:47:40.4257662Z Running tests... 2022-05-18T04:47:40.4258245Z ---------------------------------------------------------------------- 2022-05-18T04:47:41.9957508Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:42.0346988Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68375 2022-05-18T04:47:42.0454986Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68376 2022-05-18T04:47:42.0564357Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 68377 2022-05-18T04:47:42.0673097Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 68378 2022-05-18T04:47:42.9679136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:42.9934825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:43.0401345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:43.0542781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:43.3723153Z ok (2.946s) 2022-05-18T04:47:43.3723376Z 2022-05-18T04:47:43.3723818Z ---------------------------------------------------------------------- 2022-05-18T04:47:43.3724163Z Ran 1 test in 2.947s 2022-05-18T04:47:43.3724337Z 2022-05-18T04:47:43.3724417Z OK 2022-05-18T04:47:43.3724554Z 2022-05-18T04:47:43.3724693Z Generating XML reports... 2022-05-18T04:47:43.3768494Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044740.xml 2022-05-18T04:47:44.5620926Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:44.5634876Z 2022-05-18T04:47:44.5635316Z Running tests... 2022-05-18T04:47:44.5636122Z ---------------------------------------------------------------------- 2022-05-18T04:47:46.1384169Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:46.1775589Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68558 2022-05-18T04:47:46.1885437Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68559 2022-05-18T04:47:46.1996703Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 68560 2022-05-18T04:47:46.2109401Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 68561 2022-05-18T04:47:47.1765100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:47.1938863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:47.2102692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:47.2129774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:47.5159454Z ok (2.952s) 2022-05-18T04:47:47.5159711Z 2022-05-18T04:47:47.5160134Z ---------------------------------------------------------------------- 2022-05-18T04:47:47.5160488Z Ran 1 test in 2.952s 2022-05-18T04:47:47.5160656Z 2022-05-18T04:47:47.5161088Z OK 2022-05-18T04:47:47.5161231Z 2022-05-18T04:47:47.5161348Z Generating XML reports... 2022-05-18T04:47:47.5203244Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044744.xml 2022-05-18T04:47:48.7013427Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:48.7027841Z 2022-05-18T04:47:48.7027988Z Running tests... 2022-05-18T04:47:48.7028433Z ---------------------------------------------------------------------- 2022-05-18T04:47:50.3029149Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:50.3430526Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68741 2022-05-18T04:47:50.3540952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68742 2022-05-18T04:47:50.3651571Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 68743 2022-05-18T04:47:50.3765357Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 68744 2022-05-18T04:47:51.2796717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:51.2930664Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:51.3255008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:51.3674906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:53.2850966Z ok (4.582s) 2022-05-18T04:47:53.2851258Z 2022-05-18T04:47:53.2851816Z ---------------------------------------------------------------------- 2022-05-18T04:47:53.2852166Z Ran 1 test in 4.582s 2022-05-18T04:47:53.2852313Z 2022-05-18T04:47:53.2852415Z OK 2022-05-18T04:47:53.2852555Z 2022-05-18T04:47:53.2852691Z Generating XML reports... 2022-05-18T04:47:53.2895601Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044748.xml 2022-05-18T04:47:54.4923172Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:54.4937214Z 2022-05-18T04:47:54.4937359Z Running tests... 2022-05-18T04:47:54.4937799Z ---------------------------------------------------------------------- 2022-05-18T04:47:56.0729097Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:47:56.1116756Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68928 2022-05-18T04:47:56.1224459Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68929 2022-05-18T04:47:56.1333577Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 68930 2022-05-18T04:47:56.1442407Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 68931 2022-05-18T04:47:57.0441003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:57.0980279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:47:57.0996926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:47:57.1048661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:57.7498386Z ok (3.256s) 2022-05-18T04:47:57.7498613Z 2022-05-18T04:47:57.7499044Z ---------------------------------------------------------------------- 2022-05-18T04:47:57.7499369Z Ran 1 test in 3.256s 2022-05-18T04:47:57.7499538Z 2022-05-18T04:47:57.7499661Z OK 2022-05-18T04:47:57.7499798Z 2022-05-18T04:47:57.7499936Z Generating XML reports... 2022-05-18T04:47:57.7541619Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044754.xml 2022-05-18T04:47:58.9411108Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:47:58.9424948Z 2022-05-18T04:47:58.9425456Z Running tests... 2022-05-18T04:47:58.9425943Z ---------------------------------------------------------------------- 2022-05-18T04:48:00.5135835Z test_allreduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:00.5526205Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69135 2022-05-18T04:48:00.5634108Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69136 2022-05-18T04:48:00.5742269Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 69137 2022-05-18T04:48:00.5851429Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 69138 2022-05-18T04:48:01.5590367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:01.5745282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:01.5865140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:01.6044251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:02.0905198Z ok (3.148s) 2022-05-18T04:48:02.0905438Z 2022-05-18T04:48:02.0905836Z ---------------------------------------------------------------------- 2022-05-18T04:48:02.0906180Z Ran 1 test in 3.148s 2022-05-18T04:48:02.0906347Z 2022-05-18T04:48:02.0906451Z OK 2022-05-18T04:48:02.0906568Z 2022-05-18T04:48:02.0906700Z Generating XML reports... 2022-05-18T04:48:02.0949510Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044758.xml 2022-05-18T04:48:03.3118971Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:03.3133237Z 2022-05-18T04:48:03.3133519Z Running tests... 2022-05-18T04:48:03.3133955Z ---------------------------------------------------------------------- 2022-05-18T04:48:04.9075917Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:04.9476650Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69342 2022-05-18T04:48:04.9586545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69343 2022-05-18T04:48:04.9698909Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 69344 2022-05-18T04:48:04.9809499Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 69345 2022-05-18T04:48:05.8999332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:05.9037768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:05.9507135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:05.9592590Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:08.3907568Z ok (5.077s) 2022-05-18T04:48:08.3907843Z 2022-05-18T04:48:08.3908240Z ---------------------------------------------------------------------- 2022-05-18T04:48:08.3908584Z Ran 1 test in 5.077s 2022-05-18T04:48:08.3908750Z 2022-05-18T04:48:08.3908845Z OK 2022-05-18T04:48:08.3908960Z 2022-05-18T04:48:08.3909094Z Generating XML reports... 2022-05-18T04:48:08.3951708Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044803.xml 2022-05-18T04:48:09.5916357Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:09.5930295Z 2022-05-18T04:48:09.5930584Z Running tests... 2022-05-18T04:48:09.5931016Z ---------------------------------------------------------------------- 2022-05-18T04:48:11.1645095Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:11.2043797Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69553 2022-05-18T04:48:11.2154017Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69554 2022-05-18T04:48:11.2265113Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 69555 2022-05-18T04:48:11.2375079Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 69556 2022-05-18T04:48:12.1697347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:12.2046825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:12.2099213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:12.2214812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:12.5424398Z ok (2.949s) 2022-05-18T04:48:12.5424621Z 2022-05-18T04:48:12.5425075Z ---------------------------------------------------------------------- 2022-05-18T04:48:12.5425427Z Ran 1 test in 2.949s 2022-05-18T04:48:12.5425593Z 2022-05-18T04:48:12.5425693Z OK 2022-05-18T04:48:12.5425831Z 2022-05-18T04:48:12.5425967Z Generating XML reports... 2022-05-18T04:48:12.5469385Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044809.xml 2022-05-18T04:48:13.7196913Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:13.7210489Z 2022-05-18T04:48:13.7210635Z Running tests... 2022-05-18T04:48:13.7211548Z ---------------------------------------------------------------------- 2022-05-18T04:48:15.2980925Z test_broadcast_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:15.3379118Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69736 2022-05-18T04:48:15.3488728Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69737 2022-05-18T04:48:15.3596634Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 69738 2022-05-18T04:48:15.3706963Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 69739 2022-05-18T04:48:16.2972761Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:16.3045715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:16.3094672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:16.3249424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:16.5755916Z ok (2.854s) 2022-05-18T04:48:16.5756254Z 2022-05-18T04:48:16.5756659Z ---------------------------------------------------------------------- 2022-05-18T04:48:16.5756985Z Ran 1 test in 2.854s 2022-05-18T04:48:16.5757155Z 2022-05-18T04:48:16.5757252Z OK 2022-05-18T04:48:16.5757387Z 2022-05-18T04:48:16.5757525Z Generating XML reports... 2022-05-18T04:48:16.5799744Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044813.xml 2022-05-18T04:48:17.7609285Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:17.7622969Z 2022-05-18T04:48:17.7623348Z Running tests... 2022-05-18T04:48:17.7624078Z ---------------------------------------------------------------------- 2022-05-18T04:48:19.3198455Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:19.3596634Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69919 2022-05-18T04:48:19.3705070Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69920 2022-05-18T04:48:19.3813980Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 69921 2022-05-18T04:48:19.3924027Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 69922 2022-05-18T04:48:20.3757177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:20.3903268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:20.3972870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:20.4540746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:22.3012057Z ok (4.539s) 2022-05-18T04:48:22.3012722Z 2022-05-18T04:48:22.3013213Z ---------------------------------------------------------------------- 2022-05-18T04:48:22.3013659Z Ran 1 test in 4.539s 2022-05-18T04:48:22.3013865Z 2022-05-18T04:48:22.3013995Z OK 2022-05-18T04:48:22.3014188Z 2022-05-18T04:48:22.3016697Z Generating XML reports... 2022-05-18T04:48:22.3055847Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044817.xml 2022-05-18T04:48:23.5086975Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:23.5109540Z 2022-05-18T04:48:23.5110023Z Running tests... 2022-05-18T04:48:23.5110909Z ---------------------------------------------------------------------- 2022-05-18T04:48:25.0998639Z test_broadcast_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:25.1398893Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70106 2022-05-18T04:48:25.1508580Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70107 2022-05-18T04:48:25.1617965Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 70108 2022-05-18T04:48:25.1728155Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 70109 2022-05-18T04:48:26.0701421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:26.0734532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:26.0776364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:26.0806935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:26.3776128Z ok (2.866s) 2022-05-18T04:48:26.3776398Z 2022-05-18T04:48:26.3777190Z ---------------------------------------------------------------------- 2022-05-18T04:48:26.3777933Z Ran 1 test in 2.867s 2022-05-18T04:48:26.3778261Z 2022-05-18T04:48:26.3778426Z OK 2022-05-18T04:48:26.3778583Z 2022-05-18T04:48:26.3778698Z Generating XML reports... 2022-05-18T04:48:26.3822306Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044823.xml 2022-05-18T04:48:27.5753546Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:27.5767094Z 2022-05-18T04:48:27.5767322Z Running tests... 2022-05-18T04:48:27.5767775Z ---------------------------------------------------------------------- 2022-05-18T04:48:29.1440949Z test_broadcast_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:29.1838375Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70289 2022-05-18T04:48:29.1947898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70290 2022-05-18T04:48:29.2058001Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 70291 2022-05-18T04:48:29.2168446Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 70292 2022-05-18T04:48:30.1153468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:30.1272911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:30.1861503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:30.1978991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:30.6220295Z ok (3.045s) 2022-05-18T04:48:30.6220523Z 2022-05-18T04:48:30.6220931Z ---------------------------------------------------------------------- 2022-05-18T04:48:30.6221249Z Ran 1 test in 3.045s 2022-05-18T04:48:30.6221419Z 2022-05-18T04:48:30.6221514Z OK 2022-05-18T04:48:30.6221652Z 2022-05-18T04:48:30.6221788Z Generating XML reports... 2022-05-18T04:48:30.6264169Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044827.xml 2022-05-18T04:48:31.8116518Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:31.8130308Z 2022-05-18T04:48:31.8130763Z Running tests... 2022-05-18T04:48:31.8131266Z ---------------------------------------------------------------------- 2022-05-18T04:48:33.3811047Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:33.4199656Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70496 2022-05-18T04:48:33.4307440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70497 2022-05-18T04:48:33.4415734Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 70498 2022-05-18T04:48:33.4526008Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 70499 2022-05-18T04:48:34.3890805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:34.4059499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:34.4119612Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:34.4212122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:36.7622500Z ok (4.949s) 2022-05-18T04:48:36.7622731Z 2022-05-18T04:48:36.7623139Z ---------------------------------------------------------------------- 2022-05-18T04:48:36.7623494Z Ran 1 test in 4.949s 2022-05-18T04:48:36.7623913Z 2022-05-18T04:48:36.7624012Z OK 2022-05-18T04:48:36.7624159Z 2022-05-18T04:48:36.7624292Z Generating XML reports... 2022-05-18T04:48:36.7665811Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044831.xml 2022-05-18T04:48:37.9582122Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:37.9602902Z 2022-05-18T04:48:37.9603202Z Running tests... 2022-05-18T04:48:37.9604036Z ---------------------------------------------------------------------- 2022-05-18T04:48:39.5070215Z test_empty_tensors (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:39.5466661Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70707 2022-05-18T04:48:39.5574599Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70708 2022-05-18T04:48:39.5684347Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 70709 2022-05-18T04:48:39.5796533Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 70710 2022-05-18T04:48:40.4796753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:40.4924691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:40.5248914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:40.5503487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:40.7846265Z ok (2.824s) 2022-05-18T04:48:40.7846460Z 2022-05-18T04:48:40.7846863Z ---------------------------------------------------------------------- 2022-05-18T04:48:40.7847206Z Ran 1 test in 2.824s 2022-05-18T04:48:40.7847377Z 2022-05-18T04:48:40.7847478Z OK 2022-05-18T04:48:40.7847614Z 2022-05-18T04:48:40.7848073Z Generating XML reports... 2022-05-18T04:48:40.7890664Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044837.xml 2022-05-18T04:48:41.9647517Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:41.9661191Z 2022-05-18T04:48:41.9661675Z Running tests... 2022-05-18T04:48:41.9662172Z ---------------------------------------------------------------------- 2022-05-18T04:48:43.5403125Z test_gather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:43.5802509Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70890 2022-05-18T04:48:43.5911747Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70891 2022-05-18T04:48:43.6022144Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 70892 2022-05-18T04:48:43.6135518Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 70893 2022-05-18T04:48:44.5122591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:44.5221459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:44.5738279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:44.5740853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:44.8182646Z ok (2.852s) 2022-05-18T04:48:44.8182994Z 2022-05-18T04:48:44.8183841Z ---------------------------------------------------------------------- 2022-05-18T04:48:44.8184185Z Ran 1 test in 2.852s 2022-05-18T04:48:44.8184354Z 2022-05-18T04:48:44.8184451Z OK 2022-05-18T04:48:44.8184597Z 2022-05-18T04:48:44.8184732Z Generating XML reports... 2022-05-18T04:48:44.8227055Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044841.xml 2022-05-18T04:48:45.9520646Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:45.9534252Z 2022-05-18T04:48:45.9534694Z Running tests... 2022-05-18T04:48:45.9535202Z ---------------------------------------------------------------------- 2022-05-18T04:48:47.5161538Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:47.5550010Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71073 2022-05-18T04:48:47.5657978Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71074 2022-05-18T04:48:47.5768926Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 71075 2022-05-18T04:48:47.5881710Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 71076 2022-05-18T04:48:48.4715715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:48.5131263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:48.5352690Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:48.6057228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:50.3966852Z ok (4.443s) 2022-05-18T04:48:50.3967406Z 2022-05-18T04:48:50.3967854Z ---------------------------------------------------------------------- 2022-05-18T04:48:50.3968203Z Ran 1 test in 4.443s 2022-05-18T04:48:50.3968373Z 2022-05-18T04:48:50.3968449Z OK 2022-05-18T04:48:50.3968582Z 2022-05-18T04:48:50.3968741Z Generating XML reports... 2022-05-18T04:48:50.4010382Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044845.xml 2022-05-18T04:48:51.5865672Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:51.5879521Z 2022-05-18T04:48:51.5879666Z Running tests... 2022-05-18T04:48:51.5880423Z ---------------------------------------------------------------------- 2022-05-18T04:48:53.1497433Z test_gather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:53.1898130Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71260 2022-05-18T04:48:53.2009199Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71261 2022-05-18T04:48:53.2120362Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 71262 2022-05-18T04:48:53.2232925Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 71263 2022-05-18T04:48:54.1448868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:54.1912545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:54.1972816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:54.1978142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:54.5283383Z ok (2.940s) 2022-05-18T04:48:54.5283628Z 2022-05-18T04:48:54.5284050Z ---------------------------------------------------------------------- 2022-05-18T04:48:54.5284390Z Ran 1 test in 2.940s 2022-05-18T04:48:54.5284553Z 2022-05-18T04:48:54.5284652Z OK 2022-05-18T04:48:54.5284768Z 2022-05-18T04:48:54.5284909Z Generating XML reports... 2022-05-18T04:48:54.5328270Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044851.xml 2022-05-18T04:48:55.7075382Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:55.7088594Z 2022-05-18T04:48:55.7089012Z Running tests... 2022-05-18T04:48:55.7089485Z ---------------------------------------------------------------------- 2022-05-18T04:48:57.2390941Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:48:57.2786476Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71443 2022-05-18T04:48:57.2892218Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71444 2022-05-18T04:48:57.3002619Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 71445 2022-05-18T04:48:57.3110848Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 71446 2022-05-18T04:48:58.2791935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:48:58.3119340Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:58.3333777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:58.3688268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:48:58.6163416Z ok (2.907s) 2022-05-18T04:48:58.6163695Z 2022-05-18T04:48:58.6164101Z ---------------------------------------------------------------------- 2022-05-18T04:48:58.6164449Z Ran 1 test in 2.907s 2022-05-18T04:48:58.6164617Z 2022-05-18T04:48:58.6164720Z OK 2022-05-18T04:48:58.6164837Z 2022-05-18T04:48:58.6164978Z Generating XML reports... 2022-05-18T04:48:58.6207103Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044855.xml 2022-05-18T04:48:59.7826552Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:48:59.7840758Z 2022-05-18T04:48:59.7840909Z Running tests... 2022-05-18T04:48:59.7841723Z ---------------------------------------------------------------------- 2022-05-18T04:49:01.3685794Z test_gather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:01.4080074Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71626 2022-05-18T04:49:01.4191442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71627 2022-05-18T04:49:01.4302451Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 71628 2022-05-18T04:49:01.4414421Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 71629 2022-05-18T04:49:02.3784214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:02.4324649Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:02.4370558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:02.4453763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:03.3479409Z ok (3.564s) 2022-05-18T04:49:03.3479774Z 2022-05-18T04:49:03.3480200Z ---------------------------------------------------------------------- 2022-05-18T04:49:03.3480544Z Ran 1 test in 3.564s 2022-05-18T04:49:03.3480710Z 2022-05-18T04:49:03.3480813Z OK 2022-05-18T04:49:03.3480946Z 2022-05-18T04:49:03.3481842Z Generating XML reports... 2022-05-18T04:49:03.3523921Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044859.xml 2022-05-18T04:49:04.5370315Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:04.5384001Z 2022-05-18T04:49:04.5384332Z Running tests... 2022-05-18T04:49:04.5384777Z ---------------------------------------------------------------------- 2022-05-18T04:49:06.1221196Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:06.1620217Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71833 2022-05-18T04:49:06.1731542Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71834 2022-05-18T04:49:06.1844103Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 71835 2022-05-18T04:49:06.1956586Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 71836 2022-05-18T04:49:07.0917478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:07.0936860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:07.2021985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:07.2220647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:10.7079161Z ok (6.169s) 2022-05-18T04:49:10.7079663Z 2022-05-18T04:49:10.7080520Z ---------------------------------------------------------------------- 2022-05-18T04:49:10.7080899Z Ran 1 test in 6.169s 2022-05-18T04:49:10.7081076Z 2022-05-18T04:49:10.7081180Z OK 2022-05-18T04:49:10.7081318Z 2022-05-18T04:49:10.7081453Z Generating XML reports... 2022-05-18T04:49:10.7123753Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044904.xml 2022-05-18T04:49:11.8986793Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:11.9001016Z 2022-05-18T04:49:11.9001538Z Running tests... 2022-05-18T04:49:11.9002047Z ---------------------------------------------------------------------- 2022-05-18T04:49:13.4941102Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:13.5339538Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72044 2022-05-18T04:49:13.5449134Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72045 2022-05-18T04:49:13.5557695Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72046 2022-05-18T04:49:13.5668581Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72047 2022-05-18T04:49:14.4915667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:14.5128513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:14.5309177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:14.5318084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:14.8718831Z ok (2.971s) 2022-05-18T04:49:14.8719178Z 2022-05-18T04:49:14.8719621Z ---------------------------------------------------------------------- 2022-05-18T04:49:14.8719967Z Ran 1 test in 2.972s 2022-05-18T04:49:14.8720133Z 2022-05-18T04:49:14.8720254Z OK 2022-05-18T04:49:14.8720391Z 2022-05-18T04:49:14.8720525Z Generating XML reports... 2022-05-18T04:49:14.8762726Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044911.xml 2022-05-18T04:49:16.0419342Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:16.0433783Z 2022-05-18T04:49:16.0433938Z Running tests... 2022-05-18T04:49:16.0434371Z ---------------------------------------------------------------------- 2022-05-18T04:49:17.6059474Z test_reduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:17.6457843Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72231 2022-05-18T04:49:17.6567000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72232 2022-05-18T04:49:17.6678034Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72233 2022-05-18T04:49:17.6791534Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72234 2022-05-18T04:49:18.6079838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:18.6301662Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:18.6346594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:18.6447468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:18.9841742Z ok (2.940s) 2022-05-18T04:49:18.9841958Z 2022-05-18T04:49:18.9842352Z ---------------------------------------------------------------------- 2022-05-18T04:49:18.9842693Z Ran 1 test in 2.941s 2022-05-18T04:49:18.9842840Z 2022-05-18T04:49:18.9842942Z OK 2022-05-18T04:49:18.9843075Z 2022-05-18T04:49:18.9843209Z Generating XML reports... 2022-05-18T04:49:18.9885743Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044916.xml 2022-05-18T04:49:20.1617575Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:20.1632114Z 2022-05-18T04:49:20.1632391Z Running tests... 2022-05-18T04:49:20.1633074Z ---------------------------------------------------------------------- 2022-05-18T04:49:21.7488108Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:21.7888129Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72414 2022-05-18T04:49:21.7998624Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72415 2022-05-18T04:49:21.8107827Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72416 2022-05-18T04:49:21.8218749Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72417 2022-05-18T04:49:22.7346687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:22.7392802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:22.7763810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:22.7931974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:24.7306460Z ok (4.567s) 2022-05-18T04:49:24.7306669Z 2022-05-18T04:49:24.7307079Z ---------------------------------------------------------------------- 2022-05-18T04:49:24.7307426Z Ran 1 test in 4.567s 2022-05-18T04:49:24.7307599Z 2022-05-18T04:49:24.7308253Z OK 2022-05-18T04:49:24.7308609Z 2022-05-18T04:49:24.7308912Z Generating XML reports... 2022-05-18T04:49:24.7351859Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044920.xml 2022-05-18T04:49:25.9252382Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:25.9266704Z 2022-05-18T04:49:25.9267199Z Running tests... 2022-05-18T04:49:25.9267686Z ---------------------------------------------------------------------- 2022-05-18T04:49:27.5145037Z test_reduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:27.5541565Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72601 2022-05-18T04:49:27.5650936Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72602 2022-05-18T04:49:27.5760097Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72603 2022-05-18T04:49:27.5871730Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72604 2022-05-18T04:49:28.5677400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:28.5707096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:28.5721462Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:28.5759136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:28.8923084Z ok (2.965s) 2022-05-18T04:49:28.8923340Z 2022-05-18T04:49:28.8923931Z ---------------------------------------------------------------------- 2022-05-18T04:49:28.8924265Z Ran 1 test in 2.966s 2022-05-18T04:49:28.8924453Z 2022-05-18T04:49:28.8924555Z OK 2022-05-18T04:49:28.8924690Z 2022-05-18T04:49:28.8926593Z Generating XML reports... 2022-05-18T04:49:28.8966817Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044925.xml 2022-05-18T04:49:30.0868494Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:30.0882817Z 2022-05-18T04:49:30.0883203Z Running tests... 2022-05-18T04:49:30.0883720Z ---------------------------------------------------------------------- 2022-05-18T04:49:31.6887744Z test_reduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:31.7278561Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72784 2022-05-18T04:49:31.7387976Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72785 2022-05-18T04:49:31.7498016Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72786 2022-05-18T04:49:31.7611121Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72787 2022-05-18T04:49:32.6558081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:32.6770684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:32.6795039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:32.7271012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:33.3668312Z ok (3.278s) 2022-05-18T04:49:33.3668662Z 2022-05-18T04:49:33.3669436Z ---------------------------------------------------------------------- 2022-05-18T04:49:33.3669989Z Ran 1 test in 3.278s 2022-05-18T04:49:33.3670160Z 2022-05-18T04:49:33.3670255Z OK 2022-05-18T04:49:33.3670396Z 2022-05-18T04:49:33.3670530Z Generating XML reports... 2022-05-18T04:49:33.3712047Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044930.xml 2022-05-18T04:49:34.5552268Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:34.5566030Z 2022-05-18T04:49:34.5566328Z Running tests... 2022-05-18T04:49:34.5566783Z ---------------------------------------------------------------------- 2022-05-18T04:49:36.1488020Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:36.1877189Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72991 2022-05-18T04:49:36.1985987Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72992 2022-05-18T04:49:36.2094736Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 72993 2022-05-18T04:49:36.2204185Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 72994 2022-05-18T04:49:37.1217870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:37.1466392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:37.2432729Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:37.2469293Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:39.9310308Z ok (5.374s) 2022-05-18T04:49:39.9310533Z 2022-05-18T04:49:39.9310924Z ---------------------------------------------------------------------- 2022-05-18T04:49:39.9311271Z Ran 1 test in 5.374s 2022-05-18T04:49:39.9311463Z 2022-05-18T04:49:39.9311540Z OK 2022-05-18T04:49:39.9311682Z 2022-05-18T04:49:39.9311819Z Generating XML reports... 2022-05-18T04:49:39.9355526Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044934.xml 2022-05-18T04:49:41.1367542Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:41.1381226Z 2022-05-18T04:49:41.1381688Z Running tests... 2022-05-18T04:49:41.1382185Z ---------------------------------------------------------------------- 2022-05-18T04:49:42.7050989Z test_round_robin (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:42.7439847Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73202 2022-05-18T04:49:42.7547971Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73203 2022-05-18T04:49:42.7658444Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73204 2022-05-18T04:49:42.7768102Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73205 2022-05-18T04:49:43.6837957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:43.7321165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:43.7517537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:43.7685121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:44.0817007Z ok (2.943s) 2022-05-18T04:49:44.0817249Z 2022-05-18T04:49:44.0817860Z ---------------------------------------------------------------------- 2022-05-18T04:49:44.0818223Z Ran 1 test in 2.944s 2022-05-18T04:49:44.0818393Z 2022-05-18T04:49:44.0818470Z OK 2022-05-18T04:49:44.0818607Z 2022-05-18T04:49:44.0818752Z Generating XML reports... 2022-05-18T04:49:44.0860542Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044941.xml 2022-05-18T04:49:45.2526257Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:45.2539780Z 2022-05-18T04:49:45.2539977Z Running tests... 2022-05-18T04:49:45.2540422Z ---------------------------------------------------------------------- 2022-05-18T04:49:46.8146987Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:46.8534229Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73397 2022-05-18T04:49:46.8642583Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73398 2022-05-18T04:49:46.8752420Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73399 2022-05-18T04:49:46.8864362Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73400 2022-05-18T04:49:47.7857946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:47.7980367Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:47.8012138Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:47.8318537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:48.2916710Z ok (3.037s) 2022-05-18T04:49:48.2916939Z 2022-05-18T04:49:48.2917357Z ---------------------------------------------------------------------- 2022-05-18T04:49:48.2917702Z Ran 1 test in 3.038s 2022-05-18T04:49:48.2917877Z 2022-05-18T04:49:48.2917966Z OK 2022-05-18T04:49:48.2918105Z 2022-05-18T04:49:48.2918240Z Generating XML reports... 2022-05-18T04:49:48.2961277Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044945.xml 2022-05-18T04:49:49.4767526Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:49.4780774Z 2022-05-18T04:49:49.4781071Z Running tests... 2022-05-18T04:49:49.4781533Z ---------------------------------------------------------------------- 2022-05-18T04:49:51.0260612Z test_scatter_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:51.0651340Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73616 2022-05-18T04:49:51.0760574Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73617 2022-05-18T04:49:51.0871099Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73618 2022-05-18T04:49:51.0981369Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73619 2022-05-18T04:49:51.9820392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:52.0126345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:52.0339381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:52.0836086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:52.3030016Z ok (2.825s) 2022-05-18T04:49:52.3030408Z 2022-05-18T04:49:52.3031085Z ---------------------------------------------------------------------- 2022-05-18T04:49:52.3031722Z Ran 1 test in 2.825s 2022-05-18T04:49:52.3032024Z 2022-05-18T04:49:52.3032187Z OK 2022-05-18T04:49:52.3032444Z 2022-05-18T04:49:52.3032690Z Generating XML reports... 2022-05-18T04:49:52.3075605Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044949.xml 2022-05-18T04:49:53.4863332Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:53.4877824Z 2022-05-18T04:49:53.4878097Z Running tests... 2022-05-18T04:49:53.4878542Z ---------------------------------------------------------------------- 2022-05-18T04:49:55.0556900Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:49:55.0947991Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73799 2022-05-18T04:49:55.1055508Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73800 2022-05-18T04:49:55.1164855Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73801 2022-05-18T04:49:55.1275417Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73802 2022-05-18T04:49:56.0356932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:49:56.0737596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:56.0881503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:56.1095042Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:49:57.9361449Z ok (4.448s) 2022-05-18T04:49:57.9361664Z 2022-05-18T04:49:57.9362084Z ---------------------------------------------------------------------- 2022-05-18T04:49:57.9362429Z Ran 1 test in 4.448s 2022-05-18T04:49:57.9362597Z 2022-05-18T04:49:57.9362695Z OK 2022-05-18T04:49:57.9362836Z 2022-05-18T04:49:57.9362953Z Generating XML reports... 2022-05-18T04:49:57.9406331Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044953.xml 2022-05-18T04:49:59.1203262Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:49:59.1216978Z 2022-05-18T04:49:59.1217136Z Running tests... 2022-05-18T04:49:59.1217832Z ---------------------------------------------------------------------- 2022-05-18T04:50:00.6624191Z test_scatter_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:00.7015212Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73986 2022-05-18T04:50:00.7124207Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73987 2022-05-18T04:50:00.7233256Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 73988 2022-05-18T04:50:00.7342183Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 73989 2022-05-18T04:50:01.6996106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:50:01.7000308Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:01.7315926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:01.7526245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:50:02.0393977Z ok (2.917s) 2022-05-18T04:50:02.0394212Z 2022-05-18T04:50:02.0394637Z ---------------------------------------------------------------------- 2022-05-18T04:50:02.0395314Z Ran 1 test in 2.918s 2022-05-18T04:50:02.0395506Z 2022-05-18T04:50:02.0395585Z OK 2022-05-18T04:50:02.0395727Z 2022-05-18T04:50:02.0395865Z Generating XML reports... 2022-05-18T04:50:02.0438066Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044959.xml 2022-05-18T04:50:03.2574961Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:03.2588861Z 2022-05-18T04:50:03.2589320Z Running tests... 2022-05-18T04:50:03.2589803Z ---------------------------------------------------------------------- 2022-05-18T04:50:04.8410498Z test_scatter_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:04.8811953Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74169 2022-05-18T04:50:04.8921507Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74170 2022-05-18T04:50:04.9032665Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74171 2022-05-18T04:50:04.9143166Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74172 2022-05-18T04:50:05.8395272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:05.8873941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:50:05.8954711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:05.8956657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:50:06.8207176Z ok (3.561s) 2022-05-18T04:50:06.8207406Z 2022-05-18T04:50:06.8207821Z ---------------------------------------------------------------------- 2022-05-18T04:50:06.8208166Z Ran 1 test in 3.562s 2022-05-18T04:50:06.8208331Z 2022-05-18T04:50:06.8208407Z OK 2022-05-18T04:50:06.8208542Z 2022-05-18T04:50:06.8208677Z Generating XML reports... 2022-05-18T04:50:06.8251922Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045003.xml 2022-05-18T04:50:08.0093166Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:08.0107153Z 2022-05-18T04:50:08.0107384Z Running tests... 2022-05-18T04:50:08.0107829Z ---------------------------------------------------------------------- 2022-05-18T04:50:08.0115088Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) ... skip: Test is flaky, see https://github.com/pytorch/pytorch/issues/15963 (0.001s) 2022-05-18T04:50:08.0115446Z 2022-05-18T04:50:08.0115735Z ---------------------------------------------------------------------- 2022-05-18T04:50:08.0116073Z Ran 1 test in 0.001s 2022-05-18T04:50:08.0116235Z 2022-05-18T04:50:08.0116346Z OK (skipped=1) 2022-05-18T04:50:08.0116483Z 2022-05-18T04:50:08.0116608Z Generating XML reports... 2022-05-18T04:50:08.0151253Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045008.xml 2022-05-18T04:50:09.0246766Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:09.0259856Z 2022-05-18T04:50:09.0260295Z Running tests... 2022-05-18T04:50:09.0260809Z ---------------------------------------------------------------------- 2022-05-18T04:50:10.5865681Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:10.6260333Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74409 2022-05-18T04:50:10.6371472Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74410 2022-05-18T04:50:10.6483509Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74411 2022-05-18T04:50:10.6595736Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74412 2022-05-18T04:50:11.5768817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:11.6396302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:11.6410474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:50:11.6472918Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:50:11.9645759Z ok (2.938s) 2022-05-18T04:50:11.9646092Z 2022-05-18T04:50:11.9646534Z ---------------------------------------------------------------------- 2022-05-18T04:50:11.9646884Z Ran 1 test in 2.939s 2022-05-18T04:50:11.9647052Z 2022-05-18T04:50:11.9647149Z OK 2022-05-18T04:50:11.9647286Z 2022-05-18T04:50:11.9647404Z Generating XML reports... 2022-05-18T04:50:11.9689122Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045009.xml 2022-05-18T04:50:13.1580959Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:13.1596460Z 2022-05-18T04:50:13.1596901Z Running tests... 2022-05-18T04:50:13.1597642Z ---------------------------------------------------------------------- 2022-05-18T04:50:13.1602848Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) ... skip: intermittent failures on Windows, in CI (0.000s) 2022-05-18T04:50:13.1603180Z 2022-05-18T04:50:13.1603800Z ---------------------------------------------------------------------- 2022-05-18T04:50:13.1604163Z Ran 1 test in 0.001s 2022-05-18T04:50:13.1604332Z 2022-05-18T04:50:13.1604443Z OK (skipped=1) 2022-05-18T04:50:13.1604599Z 2022-05-18T04:50:13.1604725Z Generating XML reports... 2022-05-18T04:50:13.1640524Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045013.xml 2022-05-18T04:50:14.1902827Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:14.1917075Z 2022-05-18T04:50:14.1917541Z Running tests... 2022-05-18T04:50:14.1918049Z ---------------------------------------------------------------------- 2022-05-18T04:50:15.7632609Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:15.8033048Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74625 2022-05-18T04:50:15.8142335Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74626 2022-05-18T04:50:15.8253846Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74627 2022-05-18T04:50:15.8367715Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74628 2022-05-18T04:50:16.7469895Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:16.7636299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:16.7712621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:50:16.8313387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:50:18.9459800Z ok (4.754s) 2022-05-18T04:50:18.9460136Z 2022-05-18T04:50:18.9460557Z ---------------------------------------------------------------------- 2022-05-18T04:50:18.9460902Z Ran 1 test in 4.754s 2022-05-18T04:50:18.9461071Z 2022-05-18T04:50:18.9461187Z OK 2022-05-18T04:50:18.9461325Z 2022-05-18T04:50:18.9461442Z Generating XML reports... 2022-05-18T04:50:18.9505219Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045014.xml 2022-05-18T04:50:20.1580665Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:20.1594831Z 2022-05-18T04:50:20.1595071Z Running tests... 2022-05-18T04:50:20.1595515Z ---------------------------------------------------------------------- 2022-05-18T04:50:21.7602186Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:21.8005106Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74992 2022-05-18T04:50:21.8116706Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74993 2022-05-18T04:50:21.8230164Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 74994 2022-05-18T04:50:21.8345065Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 74995 2022-05-18T04:50:22.7832514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:22.8167590Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:22.8418818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:50:22.8743772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:50:23.1397681Z ok (2.980s) 2022-05-18T04:50:23.1398030Z 2022-05-18T04:50:23.1398854Z ---------------------------------------------------------------------- 2022-05-18T04:50:23.1399386Z Ran 1 test in 2.980s 2022-05-18T04:50:23.1399558Z 2022-05-18T04:50:23.1399655Z OK 2022-05-18T04:50:23.1399791Z 2022-05-18T04:50:23.1399928Z Generating XML reports... 2022-05-18T04:50:23.1443359Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045020.xml 2022-05-18T04:50:24.3214098Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:24.3227747Z 2022-05-18T04:50:24.3227981Z Running tests... 2022-05-18T04:50:24.3228887Z ---------------------------------------------------------------------- 2022-05-18T04:50:24.3299754Z test_forward_backward (__main__.ReducerTest) ... ok (0.007s) 2022-05-18T04:50:24.3347530Z 2022-05-18T04:50:24.3347941Z ---------------------------------------------------------------------- 2022-05-18T04:50:24.3348297Z Ran 1 test in 0.012s 2022-05-18T04:50:24.3348481Z 2022-05-18T04:50:24.3348579Z OK 2022-05-18T04:50:24.3348716Z 2022-05-18T04:50:24.3348845Z Generating XML reports... 2022-05-18T04:50:24.3382826Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045024.xml 2022-05-18T04:50:25.3661683Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:25.3675970Z 2022-05-18T04:50:25.3676225Z Running tests... 2022-05-18T04:50:25.3676786Z ---------------------------------------------------------------------- 2022-05-18T04:50:25.3768382Z test_forward_backward_optimizer (__main__.ReducerTest) ... [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:50:25.3792854Z ok (0.011s) 2022-05-18T04:50:25.3800007Z 2022-05-18T04:50:25.3800569Z ---------------------------------------------------------------------- 2022-05-18T04:50:25.3800972Z Ran 1 test in 0.012s 2022-05-18T04:50:25.3801148Z 2022-05-18T04:50:25.3801226Z OK 2022-05-18T04:50:25.3801367Z 2022-05-18T04:50:25.3801499Z Generating XML reports... 2022-05-18T04:50:25.3837007Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045025.xml 2022-05-18T04:50:26.4130809Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:26.4144335Z 2022-05-18T04:50:26.4144629Z Running tests... 2022-05-18T04:50:26.4145056Z ---------------------------------------------------------------------- 2022-05-18T04:50:26.4220952Z test_forward_backward_unused_parameters (__main__.ReducerTest) ... ok (0.007s) 2022-05-18T04:50:26.4266194Z 2022-05-18T04:50:26.4266846Z ---------------------------------------------------------------------- 2022-05-18T04:50:26.4267206Z Ran 1 test in 0.012s 2022-05-18T04:50:26.4267377Z 2022-05-18T04:50:26.4267482Z OK 2022-05-18T04:50:26.4267620Z 2022-05-18T04:50:26.4267753Z Generating XML reports... 2022-05-18T04:50:26.4301558Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045026.xml 2022-05-18T04:50:27.4686937Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:27.4700846Z 2022-05-18T04:50:27.4701279Z Running tests... 2022-05-18T04:50:27.4701763Z ---------------------------------------------------------------------- 2022-05-18T04:50:27.4740416Z test_multi_dtype_multi_bucket (__main__.ReducerTest) ... ok (0.004s) 2022-05-18T04:50:27.4819000Z 2022-05-18T04:50:27.4819359Z ---------------------------------------------------------------------- 2022-05-18T04:50:27.4819712Z Ran 1 test in 0.012s 2022-05-18T04:50:27.4819865Z 2022-05-18T04:50:27.4819960Z OK 2022-05-18T04:50:27.4820097Z 2022-05-18T04:50:27.4820230Z Generating XML reports... 2022-05-18T04:50:27.4854744Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045027.xml 2022-05-18T04:50:28.5192274Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:28.5205994Z 2022-05-18T04:50:28.5206429Z Running tests... 2022-05-18T04:50:28.5206934Z ---------------------------------------------------------------------- 2022-05-18T04:50:28.5270564Z test_multi_dtype_single_bucket (__main__.ReducerTest) ... ok (0.006s) 2022-05-18T04:50:28.5323169Z 2022-05-18T04:50:28.5323512Z ---------------------------------------------------------------------- 2022-05-18T04:50:28.5323876Z Ran 1 test in 0.012s 2022-05-18T04:50:28.5324046Z 2022-05-18T04:50:28.5324144Z OK 2022-05-18T04:50:28.5324281Z 2022-05-18T04:50:28.5324412Z Generating XML reports... 2022-05-18T04:50:28.5358480Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045028.xml 2022-05-18T04:50:29.5600355Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:29.5614349Z 2022-05-18T04:50:29.5614533Z Running tests... 2022-05-18T04:50:29.5615491Z ---------------------------------------------------------------------- 2022-05-18T04:50:29.5649553Z test_single_dtype_single_bucket (__main__.ReducerTest) ... ok (0.003s) 2022-05-18T04:50:29.5731070Z 2022-05-18T04:50:29.5731477Z ---------------------------------------------------------------------- 2022-05-18T04:50:29.5731819Z Ran 1 test in 0.012s 2022-05-18T04:50:29.5731997Z 2022-05-18T04:50:29.5732097Z OK 2022-05-18T04:50:29.5732235Z 2022-05-18T04:50:29.5732340Z Generating XML reports... 2022-05-18T04:50:29.5766811Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045029.xml 2022-05-18T04:50:30.6057981Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:30.6071886Z 2022-05-18T04:50:30.6072159Z Running tests... 2022-05-18T04:50:30.6072590Z ---------------------------------------------------------------------- 2022-05-18T04:50:32.1951639Z test_logging_init (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:32.2115896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:32.2118397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:50:32.2215288Z ok (1.614s) 2022-05-18T04:50:32.2216837Z 2022-05-18T04:50:32.2217248Z ---------------------------------------------------------------------- 2022-05-18T04:50:32.2217602Z Ran 1 test in 1.614s 2022-05-18T04:50:32.2217776Z 2022-05-18T04:50:32.2217872Z OK 2022-05-18T04:50:32.2218309Z 2022-05-18T04:50:32.2218458Z Generating XML reports... 2022-05-18T04:50:32.2250421Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20220518045030.xml 2022-05-18T04:50:33.3817355Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T04:50:33.3831915Z 2022-05-18T04:50:33.3832185Z Running tests... 2022-05-18T04:50:33.3832610Z ---------------------------------------------------------------------- 2022-05-18T04:50:34.9717440Z test_default_store_timeout_gloo (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:34.9868338Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/74714 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (1.603s) 2022-05-18T04:50:34.9869040Z 2022-05-18T04:50:34.9869323Z ---------------------------------------------------------------------- 2022-05-18T04:50:34.9869667Z Ran 1 test in 1.604s 2022-05-18T04:50:34.9869829Z 2022-05-18T04:50:34.9869940Z OK (skipped=1) 2022-05-18T04:50:34.9870099Z 2022-05-18T04:50:34.9870222Z Generating XML reports... 2022-05-18T04:50:34.9902599Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20220518045033.xml 2022-05-18T04:50:35.3742588Z Running distributed/fsdp/test_fsdp_mixed_precision ... [2022-05-18 04:50:35.373696] 2022-05-18T04:50:35.3743414Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_mixed_precision.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:50:35.373805] 2022-05-18T04:50:37.8720921Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:37.8802440Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision 2022-05-18T04:50:37.8830996Z 2022-05-18T04:50:37.8831319Z Running tests... 2022-05-18T04:50:37.8831737Z ---------------------------------------------------------------------- 2022-05-18T04:50:37.9227443Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75515 2022-05-18T04:50:37.9339329Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75516 2022-05-18T04:50:40.5367664Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:40.5373965Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:40.5418945Z dist init r=1, world=2 2022-05-18T04:50:40.5423833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:40.5425061Z dist init r=0, world=2 2022-05-18T04:50:40.5430041Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:40.5430910Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:40.5527222Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:41.5961380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:41.5962199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:42.3456614Z ok (4.462s) 2022-05-18T04:50:42.3592913Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75599 2022-05-18T04:50:42.3705314Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75600 2022-05-18T04:50:44.9426403Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:44.9476742Z dist init r=1, world=2 2022-05-18T04:50:44.9481700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:44.9554546Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:44.9604901Z dist init r=0, world=2 2022-05-18T04:50:44.9609606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:44.9610427Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:44.9686467Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:45.9740419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:45.9740953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:46.6815724Z ok (4.336s) 2022-05-18T04:50:46.6953725Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75683 2022-05-18T04:50:46.7061265Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75684 2022-05-18T04:50:49.2500036Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:49.2551509Z dist init r=1, world=2 2022-05-18T04:50:49.2556777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:49.2668551Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:49.2720617Z dist init r=0, world=2 2022-05-18T04:50:49.2725729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:49.2726827Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:49.2761792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:50.2830426Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:50.2830960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:51.0169356Z ok (4.335s) 2022-05-18T04:50:51.0307505Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75767 2022-05-18T04:50:51.0415315Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75768 2022-05-18T04:50:53.6030770Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:53.6064690Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:53.6081858Z dist init r=0, world=2 2022-05-18T04:50:53.6086805Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:53.6114777Z dist init r=1, world=2 2022-05-18T04:50:53.6119103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:53.6120320Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:53.6190462Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:54.6221528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:54.6222061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:55.3523842Z ok (4.335s) 2022-05-18T04:50:55.3659690Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75851 2022-05-18T04:50:55.3768430Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75852 2022-05-18T04:50:57.9169807Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:57.9220235Z dist init r=0, world=2 2022-05-18T04:50:57.9225598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:57.9308890Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:57.9360829Z dist init r=1, world=2 2022-05-18T04:50:57.9365933Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:57.9366882Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:57.9431411Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:58.9820755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:58.9821360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:59.6876236Z ok (4.335s) 2022-05-18T04:50:59.7013995Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75935 2022-05-18T04:50:59.7124758Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75936 2022-05-18T04:51:02.2419714Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:02.2471084Z dist init r=0, world=2 2022-05-18T04:51:02.2476376Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:02.2651142Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:02.2703528Z dist init r=1, world=2 2022-05-18T04:51:02.2708349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:02.2709640Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:02.2783733Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:03.3224726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:03.3225291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:04.0235030Z ok (4.336s) 2022-05-18T04:51:04.0372395Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76019 2022-05-18T04:51:04.0480231Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76020 2022-05-18T04:51:06.6013791Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:06.6065551Z dist init r=1, world=2 2022-05-18T04:51:06.6071304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:06.6304805Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:06.6356563Z dist init r=0, world=2 2022-05-18T04:51:06.6360973Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:06.6362165Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:06.6377849Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:07.6753940Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:07.6754459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:08.3590489Z ok (4.335s) 2022-05-18T04:51:08.3728227Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76103 2022-05-18T04:51:08.3836706Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76104 2022-05-18T04:51:10.9176836Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:10.9226771Z dist init r=1, world=2 2022-05-18T04:51:10.9232088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:10.9689842Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:10.9741453Z dist init r=0, world=2 2022-05-18T04:51:10.9746486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:10.9747589Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:10.9843639Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:12.0016551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:12.0017087Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:12.6945971Z ok (4.335s) 2022-05-18T04:51:12.7085387Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76187 2022-05-18T04:51:12.7194539Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76188 2022-05-18T04:51:15.2807709Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:15.2858886Z dist init r=0, world=2 2022-05-18T04:51:15.2864519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:15.2939068Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:15.2989667Z dist init r=1, world=2 2022-05-18T04:51:15.2993975Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:15.2994923Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:15.3069962Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:16.3353646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:16.3354182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:17.0306254Z ok (4.336s) 2022-05-18T04:51:17.0445843Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76271 2022-05-18T04:51:17.0556256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76272 2022-05-18T04:51:19.5758508Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:19.5808947Z dist init r=0, world=2 2022-05-18T04:51:19.5814262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:19.5968364Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:19.6020253Z dist init r=1, world=2 2022-05-18T04:51:19.6024758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:19.6026239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:19.6121229Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:20.6367871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:20.6368709Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:21.3666413Z ok (4.336s) 2022-05-18T04:51:21.3802748Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76355 2022-05-18T04:51:21.3910215Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76356 2022-05-18T04:51:23.9308603Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:23.9359542Z dist init r=0, world=2 2022-05-18T04:51:23.9364807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:23.9504208Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:23.9556791Z dist init r=1, world=2 2022-05-18T04:51:23.9561627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:23.9562840Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:23.9569292Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:25.0003924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:25.0004824Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:25.7020034Z ok (4.335s) 2022-05-18T04:51:25.7159884Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76439 2022-05-18T04:51:25.7267504Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76440 2022-05-18T04:51:28.2533690Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:28.2583299Z dist init r=1, world=2 2022-05-18T04:51:28.2588198Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:28.2651192Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:28.2703247Z dist init r=0, world=2 2022-05-18T04:51:28.2708188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:28.2709343Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:28.2793135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:29.2981246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:29.2981771Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:29.9393373Z ok (4.237s) 2022-05-18T04:51:29.9530737Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76523 2022-05-18T04:51:29.9638671Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76524 2022-05-18T04:51:32.5061691Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:32.5112257Z dist init r=1, world=2 2022-05-18T04:51:32.5117529Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:32.5747344Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:32.5799194Z dist init r=0, world=2 2022-05-18T04:51:32.5803623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:32.5804705Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:32.5830845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:33.5980905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:33.5981430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:34.2745542Z ok (4.335s) 2022-05-18T04:51:34.2883407Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76607 2022-05-18T04:51:34.2990923Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76608 2022-05-18T04:51:36.8529491Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:36.8580104Z dist init r=0, world=2 2022-05-18T04:51:36.8584985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:36.8635033Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:36.8686327Z dist init r=1, world=2 2022-05-18T04:51:36.8690815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:36.8691935Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:36.8790614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:37.8967621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:37.8968156Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:38.6100226Z ok (4.335s) 2022-05-18T04:51:38.6238125Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76691 2022-05-18T04:51:38.6345444Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76692 2022-05-18T04:51:41.1874798Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:41.1885954Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:41.1925363Z dist init r=1, world=2 2022-05-18T04:51:41.1930735Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:41.1936930Z dist init r=0, world=2 2022-05-18T04:51:41.1941274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:41.1942515Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:41.2034151Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:42.2168741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:42.2169298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:42.8453676Z ok (4.235s) 2022-05-18T04:51:42.8591598Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76775 2022-05-18T04:51:42.8699666Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76776 2022-05-18T04:51:45.3806659Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:45.3856730Z dist init r=1, world=2 2022-05-18T04:51:45.3862097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:45.4145105Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:45.4197207Z dist init r=0, world=2 2022-05-18T04:51:45.4201908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:45.4203060Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:45.4270614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:46.4486756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:46.4487327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:47.1808558Z ok (4.335s) 2022-05-18T04:51:47.1945662Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76859 2022-05-18T04:51:47.2056578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76860 2022-05-18T04:51:49.7484106Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:49.7535690Z dist init r=0, world=2 2022-05-18T04:51:49.7540288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:49.7545993Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:49.7595922Z dist init r=1, world=2 2022-05-18T04:51:49.7601250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:49.7602080Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:49.7644052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:50.7782517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:50.7783039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:51.4161720Z ok (4.235s) 2022-05-18T04:51:51.4299651Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76943 2022-05-18T04:51:51.4408289Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76944 2022-05-18T04:51:53.9955086Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:53.9996786Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:54.0006097Z dist init r=0, world=2 2022-05-18T04:51:54.0011448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:54.0048337Z dist init r=1, world=2 2022-05-18T04:51:54.0053450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:54.0054296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:54.0114849Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:55.0206260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:55.0206810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:55.7520592Z ok (4.336s) 2022-05-18T04:51:55.7655812Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77027 2022-05-18T04:51:55.7763111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77028 2022-05-18T04:51:58.3144777Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:58.3194846Z dist init r=1, world=2 2022-05-18T04:51:58.3199873Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:58.3318866Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:58.3369888Z dist init r=0, world=2 2022-05-18T04:51:58.3374517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:58.3375618Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:58.3404994Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:59.3496636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:59.3497182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:59.9871701Z ok (4.235s) 2022-05-18T04:52:00.0007342Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77111 2022-05-18T04:52:00.0115734Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77112 2022-05-18T04:52:02.6167320Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:02.6218354Z dist init r=0, world=2 2022-05-18T04:52:02.6223211Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:02.6293649Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:02.6345501Z dist init r=1, world=2 2022-05-18T04:52:02.6349774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:02.6350952Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:02.6428000Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:03.6695763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:03.6696308Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:04.3225073Z ok (4.335s) 2022-05-18T04:52:04.3363038Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77195 2022-05-18T04:52:04.3472990Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77196 2022-05-18T04:52:06.9291654Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:06.9342179Z dist init r=0, world=2 2022-05-18T04:52:06.9347955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:06.9381512Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:06.9432715Z dist init r=1, world=2 2022-05-18T04:52:06.9437034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:06.9438176Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:06.9451085Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:07.9595019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:07.9595569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:08.5596818Z ok (4.237s) 2022-05-18T04:52:08.5735263Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77279 2022-05-18T04:52:08.5842654Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77280 2022-05-18T04:52:11.1604592Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:11.1654380Z dist init r=0, world=2 2022-05-18T04:52:11.1659224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:11.1683468Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:11.1733538Z dist init r=1, world=2 2022-05-18T04:52:11.1738408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:11.1739407Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:11.1762018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:12.2118777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:12.2119313Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:12.8953064Z ok (4.335s) 2022-05-18T04:52:12.9091224Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77363 2022-05-18T04:52:12.9199064Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77364 2022-05-18T04:52:15.4690997Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:15.4743307Z dist init r=1, world=2 2022-05-18T04:52:15.4747733Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:15.4749322Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:15.4798232Z dist init r=0, world=2 2022-05-18T04:52:15.4802538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:15.4803349Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:15.4852764Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:16.5179221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:16.5179751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:17.1306300Z ok (4.235s) 2022-05-18T04:52:17.1444851Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77447 2022-05-18T04:52:17.1554195Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77448 2022-05-18T04:52:19.6940568Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:19.6990878Z dist init r=0, world=2 2022-05-18T04:52:19.6995678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:19.6996239Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:19.7047347Z dist init r=1, world=2 2022-05-18T04:52:19.7051809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:19.7052618Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:19.7099299Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:20.7145656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:20.7146189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:21.3659428Z ok (4.235s) 2022-05-18T04:52:21.3799630Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77531 2022-05-18T04:52:21.3906787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77532 2022-05-18T04:52:23.9359450Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:23.9411841Z dist init r=1, world=2 2022-05-18T04:52:23.9417511Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:23.9480006Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:23.9530029Z dist init r=0, world=2 2022-05-18T04:52:23.9534239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:23.9535441Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:23.9622993Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:24.9904768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:24.9905295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:25.7015663Z ok (4.335s) 2022-05-18T04:52:25.7153904Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77615 2022-05-18T04:52:25.7260720Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77616 2022-05-18T04:52:28.2886425Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:28.2938840Z dist init r=1, world=2 2022-05-18T04:52:28.2940199Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:28.2944752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:28.2990988Z dist init r=0, world=2 2022-05-18T04:52:28.2995282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:28.2996413Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:28.3048285Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:29.3522852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:29.3523458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:30.0370316Z ok (4.335s) 2022-05-18T04:52:30.0508937Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77699 2022-05-18T04:52:30.0617645Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77700 2022-05-18T04:52:32.6193987Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:32.6252940Z dist init r=1, world=2 2022-05-18T04:52:32.6258699Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:32.6310147Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:32.6362620Z dist init r=0, world=2 2022-05-18T04:52:32.6367283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:32.6368493Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:32.6464120Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:33.6890377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:33.6890915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:34.3726456Z ok (4.335s) 2022-05-18T04:52:34.3863957Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77783 2022-05-18T04:52:34.3971186Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77784 2022-05-18T04:52:36.9742359Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:36.9753623Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:36.9801457Z dist init r=0, world=2 2022-05-18T04:52:36.9805715Z dist init r=1, world=2 2022-05-18T04:52:36.9806730Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:36.9810436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:36.9811275Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:36.9911005Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:38.0080129Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:38.0080675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:38.7079146Z ok (4.335s) 2022-05-18T04:52:38.7218398Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77867 2022-05-18T04:52:38.7326698Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77868 2022-05-18T04:52:41.3252527Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:41.3311800Z dist init r=1, world=2 2022-05-18T04:52:41.3316353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:41.3403891Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:41.3455648Z dist init r=0, world=2 2022-05-18T04:52:41.3460648Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:41.3462176Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:41.3522122Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:42.3610915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:42.3611983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:43.0435409Z ok (4.335s) 2022-05-18T04:52:43.0573360Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77951 2022-05-18T04:52:43.0681365Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77952 2022-05-18T04:52:45.6120995Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:45.6170630Z dist init r=0, world=2 2022-05-18T04:52:45.6175676Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:45.6221195Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:45.6279926Z dist init r=1, world=2 2022-05-18T04:52:45.6284815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:45.6286110Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:45.6381018Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:46.6634083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:46.6634943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:47.3790553Z ok (4.335s) 2022-05-18T04:52:47.3928140Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78035 2022-05-18T04:52:47.4035862Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78036 2022-05-18T04:52:49.9650899Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:49.9695716Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:49.9703155Z dist init r=1, world=2 2022-05-18T04:52:49.9709243Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:49.9747771Z dist init r=0, world=2 2022-05-18T04:52:49.9752346Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:49.9753512Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:49.9812718Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:51.0274516Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:51.0275060Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:51.7151749Z ok (4.336s) 2022-05-18T04:52:51.7290410Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78119 2022-05-18T04:52:51.7399061Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78120 2022-05-18T04:52:54.2836785Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:54.2887483Z dist init r=1, world=2 2022-05-18T04:52:54.2892674Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:54.2950762Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:54.3007098Z dist init r=0, world=2 2022-05-18T04:52:54.3011580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:54.3012636Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:54.3097995Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:55.3332010Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:55.3332544Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:56.0508431Z ok (4.335s) 2022-05-18T04:52:56.0646613Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78203 2022-05-18T04:52:56.0754353Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78204 2022-05-18T04:52:58.6199707Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:58.6249297Z dist init r=0, world=2 2022-05-18T04:52:58.6254337Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:52:58.6408938Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:58.6460720Z dist init r=1, world=2 2022-05-18T04:52:58.6465651Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:52:58.6466707Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:58.6561556Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:52:59.6957373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:59.6957939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:00.3862030Z ok (4.335s) 2022-05-18T04:53:00.4000022Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78287 2022-05-18T04:53:00.4108467Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78288 2022-05-18T04:53:02.9643933Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:02.9694408Z dist init r=1, world=2 2022-05-18T04:53:02.9699876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:02.9969905Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:03.0021193Z dist init r=0, world=2 2022-05-18T04:53:03.0026140Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:03.0027266Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:03.0107536Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:04.0246644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:04.0247171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:04.7217084Z ok (4.335s) 2022-05-18T04:53:04.7354317Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78371 2022-05-18T04:53:04.7460953Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78372 2022-05-18T04:53:07.2412399Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:07.2463214Z dist init r=1, world=2 2022-05-18T04:53:07.2468857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:07.2561356Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:07.2611494Z dist init r=0, world=2 2022-05-18T04:53:07.2615987Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:07.2617134Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:07.2673873Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:08.2944898Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:08.2945464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:08.9567423Z ok (4.235s) 2022-05-18T04:53:08.9701434Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78455 2022-05-18T04:53:08.9812257Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78456 2022-05-18T04:53:11.5172524Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:11.5222952Z dist init r=1, world=2 2022-05-18T04:53:11.5228739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:11.5303521Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:11.5354635Z dist init r=0, world=2 2022-05-18T04:53:11.5359203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:11.5360644Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:11.5433904Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:12.5591454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:12.5592013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:13.1917910Z ok (4.235s) 2022-05-18T04:53:13.2055085Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78539 2022-05-18T04:53:13.2162543Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78540 2022-05-18T04:53:15.7695162Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:15.7746321Z dist init r=0, world=2 2022-05-18T04:53:15.7751823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:15.7818570Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:15.7869886Z dist init r=1, world=2 2022-05-18T04:53:15.7874425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:15.7875751Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:15.7957700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:16.8342594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:16.8343137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:17.5271936Z ok (4.335s) 2022-05-18T04:53:17.5408359Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78623 2022-05-18T04:53:17.5515511Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78624 2022-05-18T04:53:20.0577431Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:20.0628382Z dist init r=1, world=2 2022-05-18T04:53:20.0633481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:20.0644715Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:20.0695071Z dist init r=0, world=2 2022-05-18T04:53:20.0699319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:20.0700542Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:20.0736934Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:21.0852796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:21.0853309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:21.7622522Z ok (4.235s) 2022-05-18T04:53:21.7761709Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78707 2022-05-18T04:53:21.7870020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78708 2022-05-18T04:53:24.3393479Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:24.3413912Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:24.3443467Z dist init r=0, world=2 2022-05-18T04:53:24.3448869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:24.3472507Z dist init r=1, world=2 2022-05-18T04:53:24.3477350Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:24.3478734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:24.3552123Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:25.3934262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:25.3934778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:26.0978705Z ok (4.335s) 2022-05-18T04:53:26.1115051Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78791 2022-05-18T04:53:26.1224778Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78792 2022-05-18T04:53:28.6893097Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:28.6904513Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:28.6951675Z dist init r=1, world=2 2022-05-18T04:53:28.6955036Z dist init r=0, world=2 2022-05-18T04:53:28.6957903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:28.6959614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:28.6960857Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:28.7061176Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:29.7567890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:29.7568611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:30.4334794Z ok (4.335s) 2022-05-18T04:53:30.4471944Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78875 2022-05-18T04:53:30.4579282Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78876 2022-05-18T04:53:33.0045358Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:33.0056570Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:33.0095388Z dist init r=0, world=2 2022-05-18T04:53:33.0100102Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:33.0113578Z dist init r=1, world=2 2022-05-18T04:53:33.0119103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:33.0120765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:33.0204079Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:34.0548598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:34.0549248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:34.7688015Z ok (4.335s) 2022-05-18T04:53:34.7827399Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78959 2022-05-18T04:53:34.7935198Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78960 2022-05-18T04:53:37.3410912Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:37.3462106Z dist init r=1, world=2 2022-05-18T04:53:37.3467936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:37.3528136Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:37.3583931Z dist init r=0, world=2 2022-05-18T04:53:37.3588342Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:37.3589677Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:37.3673760Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:38.3988521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:38.3989470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:39.1044313Z ok (4.335s) 2022-05-18T04:53:39.1182960Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79043 2022-05-18T04:53:39.1290651Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79044 2022-05-18T04:53:41.6595469Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:41.6645748Z dist init r=0, world=2 2022-05-18T04:53:41.6650736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:41.6707768Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:41.6764387Z dist init r=1, world=2 2022-05-18T04:53:41.6768762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:41.6770066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:41.6856211Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:42.7331522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:42.7332090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:43.4399603Z ok (4.335s) 2022-05-18T04:53:43.4535227Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79127 2022-05-18T04:53:43.4644201Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79128 2022-05-18T04:53:45.9964032Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:46.0014444Z dist init r=0, world=2 2022-05-18T04:53:46.0019400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:46.0151553Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:46.0203576Z dist init r=1, world=2 2022-05-18T04:53:46.0209261Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:46.0210325Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:46.0224520Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:47.0548643Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:47.0549160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:47.7752680Z ok (4.335s) 2022-05-18T04:53:47.7890274Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79211 2022-05-18T04:53:47.7997253Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79212 2022-05-18T04:53:50.3288394Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:50.3333067Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:50.3338005Z dist init r=0, world=2 2022-05-18T04:53:50.3343301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:50.3390916Z dist init r=1, world=2 2022-05-18T04:53:50.3395628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:50.3396779Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:50.3446848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:51.3803008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:51.3803536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:52.0102507Z ok (4.235s) 2022-05-18T04:53:52.0240188Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79295 2022-05-18T04:53:52.0346850Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79296 2022-05-18T04:53:54.5846806Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:54.5886280Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:54.5896654Z dist init r=0, world=2 2022-05-18T04:53:54.5901887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:54.5944647Z dist init r=1, world=2 2022-05-18T04:53:54.5949562Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:54.5950692Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:54.6005614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:55.6378210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:55.6378736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:56.3455964Z ok (4.335s) 2022-05-18T04:53:56.3592320Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79379 2022-05-18T04:53:56.3699191Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79380 2022-05-18T04:53:58.9144935Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:58.9196670Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:58.9203184Z dist init r=1, world=2 2022-05-18T04:53:58.9209046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:53:58.9246923Z dist init r=0, world=2 2022-05-18T04:53:58.9251350Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:53:58.9252175Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:58.9312365Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:53:59.9820116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:59.9821004Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:00.6807801Z ok (4.335s) 2022-05-18T04:54:00.6945139Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79463 2022-05-18T04:54:00.7054953Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79464 2022-05-18T04:54:03.2541686Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:03.2569287Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:03.2592578Z dist init r=0, world=2 2022-05-18T04:54:03.2597889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:03.2628235Z dist init r=1, world=2 2022-05-18T04:54:03.2633097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:03.2634198Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:03.2701373Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:04.3123169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:04.3123743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:05.0161492Z ok (4.335s) 2022-05-18T04:54:05.0299233Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79547 2022-05-18T04:54:05.0407215Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79548 2022-05-18T04:54:07.5328370Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:07.5377303Z dist init r=0, world=2 2022-05-18T04:54:07.5382408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:07.6077079Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:07.6128580Z dist init r=1, world=2 2022-05-18T04:54:07.6133395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:07.6134479Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:07.6197585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:08.6656719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:08.6657222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:09.3515963Z ok (4.335s) 2022-05-18T04:54:09.3653249Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79631 2022-05-18T04:54:09.3761129Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79632 2022-05-18T04:54:11.9120759Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:11.9171087Z dist init r=1, world=2 2022-05-18T04:54:11.9176045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:11.9392184Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:11.9444309Z dist init r=0, world=2 2022-05-18T04:54:11.9448904Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:11.9450007Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:11.9482705Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:12.9743623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:12.9744399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:13.6867969Z ok (4.335s) 2022-05-18T04:54:13.7005236Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79715 2022-05-18T04:54:13.7111784Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79716 2022-05-18T04:54:16.2775976Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:16.2826957Z dist init r=0, world=2 2022-05-18T04:54:16.2832221Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:16.2956604Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:16.3008403Z dist init r=1, world=2 2022-05-18T04:54:16.3012699Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:16.3013864Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:16.3037066Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:17.3334760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:17.3335280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:18.0220661Z ok (4.335s) 2022-05-18T04:54:18.0357720Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79799 2022-05-18T04:54:18.0467432Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79800 2022-05-18T04:54:20.6031530Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:20.6082550Z dist init r=1, world=2 2022-05-18T04:54:20.6087565Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:20.6167766Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:20.6218904Z dist init r=0, world=2 2022-05-18T04:54:20.6223380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:20.6224497Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:20.6292508Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:21.6505360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:21.6505909Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:22.3574790Z ok (4.335s) 2022-05-18T04:54:22.3713020Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79883 2022-05-18T04:54:22.3820156Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79884 2022-05-18T04:54:24.9724811Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:24.9776657Z dist init r=0, world=2 2022-05-18T04:54:24.9782135Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:24.9839352Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:24.9890833Z dist init r=1, world=2 2022-05-18T04:54:24.9895319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:24.9896612Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:24.9987490Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:26.0414835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:26.0415513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:26.6927302Z ok (4.335s) 2022-05-18T04:54:26.7061689Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79967 2022-05-18T04:54:26.7170027Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79968 2022-05-18T04:54:29.2454829Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:29.2504775Z dist init r=0, world=2 2022-05-18T04:54:29.2510199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:29.2751491Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:29.2803350Z dist init r=1, world=2 2022-05-18T04:54:29.2807714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:29.2808781Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:29.2816453Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:30.3219126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:30.3219857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:31.0278540Z ok (4.335s) 2022-05-18T04:54:31.0414364Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80051 2022-05-18T04:54:31.0522600Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80052 2022-05-18T04:54:33.5842908Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:33.5892691Z dist init r=0, world=2 2022-05-18T04:54:33.5897913Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:33.5925659Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:33.5977013Z dist init r=1, world=2 2022-05-18T04:54:33.5981305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:33.5982380Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:33.6001255Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:34.6392574Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:34.6393111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:35.2629315Z ok (4.235s) 2022-05-18T04:54:35.2766542Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80135 2022-05-18T04:54:35.2875977Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80136 2022-05-18T04:54:37.8217010Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:37.8236295Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:37.8269673Z dist init r=0, world=2 2022-05-18T04:54:37.8274842Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:37.8287428Z dist init r=1, world=2 2022-05-18T04:54:37.8291665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:37.8292845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:37.8378215Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:38.8576186Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:38.8576735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:39.4983491Z ok (4.235s) 2022-05-18T04:54:39.5122104Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80219 2022-05-18T04:54:39.5231499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80220 2022-05-18T04:54:42.0665077Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:42.0723628Z dist init r=0, world=2 2022-05-18T04:54:42.0728564Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:42.0767585Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:42.0818003Z dist init r=1, world=2 2022-05-18T04:54:42.0822648Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:42.0823463Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:42.0831454Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:43.0958273Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:43.0958816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:43.7337780Z ok (4.235s) 2022-05-18T04:54:43.7474999Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80303 2022-05-18T04:54:43.7581685Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80304 2022-05-18T04:54:46.3063901Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:46.3097440Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:46.3115552Z dist init r=0, world=2 2022-05-18T04:54:46.3120684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:46.3155553Z dist init r=1, world=2 2022-05-18T04:54:46.3160532Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:46.3161505Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:46.3224145Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:47.3586772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:47.3587327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:48.0690157Z ok (4.335s) 2022-05-18T04:54:48.0825381Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80387 2022-05-18T04:54:48.0933413Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80388 2022-05-18T04:54:50.5904043Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:50.5954913Z dist init r=0, world=2 2022-05-18T04:54:50.5959327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:50.6071769Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:50.6122142Z dist init r=1, world=2 2022-05-18T04:54:50.6126582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:50.6127445Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:50.6164511Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:51.6275597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:51.6276137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:52.3040830Z ok (4.235s) 2022-05-18T04:54:52.3179480Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80471 2022-05-18T04:54:52.3288548Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80472 2022-05-18T04:54:54.8566671Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:54.8616502Z dist init r=0, world=2 2022-05-18T04:54:54.8621374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:54.8660760Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:54.8719268Z dist init r=1, world=2 2022-05-18T04:54:54.8724456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:54.8725470Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:54.8726165Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:55.9085303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:55.9085832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:56.6397367Z ok (4.335s) 2022-05-18T04:54:56.6535053Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80555 2022-05-18T04:54:56.6642263Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80556 2022-05-18T04:54:59.1930979Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:59.1981090Z dist init r=0, world=2 2022-05-18T04:54:59.1986917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:54:59.2324703Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:59.2376695Z dist init r=1, world=2 2022-05-18T04:54:59.2381678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:54:59.2382880Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:54:59.2395564Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:00.2934808Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:00.2935388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:00.9752082Z ok (4.335s) 2022-05-18T04:55:00.9893083Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80639 2022-05-18T04:55:01.0002302Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80640 2022-05-18T04:55:03.5700256Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:03.5708511Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:03.5757589Z dist init r=1, world=2 2022-05-18T04:55:03.5758195Z dist init r=0, world=2 2022-05-18T04:55:03.5763566Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:03.5764431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:03.5765241Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:03.5765954Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:04.6196896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:04.6197605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:05.3112471Z ok (4.336s) 2022-05-18T04:55:05.3250254Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80723 2022-05-18T04:55:05.3357815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80724 2022-05-18T04:55:07.8899287Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:07.8949871Z dist init r=1, world=2 2022-05-18T04:55:07.8954675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:07.9130170Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:07.9182164Z dist init r=0, world=2 2022-05-18T04:55:07.9187212Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:07.9188266Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:07.9261233Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:08.9396115Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:08.9396935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:09.6466709Z ok (4.335s) 2022-05-18T04:55:09.6603877Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80807 2022-05-18T04:55:09.6713671Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80808 2022-05-18T04:55:12.2322796Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:12.2373320Z dist init r=0, world=2 2022-05-18T04:55:12.2378580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:12.2499928Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:12.2552408Z dist init r=1, world=2 2022-05-18T04:55:12.2557150Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:12.2558495Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:12.2583528Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:13.3038268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:13.3038903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:13.9822460Z ok (4.335s) 2022-05-18T04:55:13.9958990Z test_mixed_precision_no_reshard_after_forward (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80891 2022-05-18T04:55:14.0066797Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80892 2022-05-18T04:55:16.5486658Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:16.5519619Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:16.5536457Z dist init r=1, world=2 2022-05-18T04:55:16.5541786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:16.5577936Z dist init r=0, world=2 2022-05-18T04:55:16.5582727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:16.5584158Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:16.5645070Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:17.5948597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:17.5949127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:18.3174103Z ok (4.335s) 2022-05-18T04:55:18.3192933Z test_mixed_precision_resnet (__main__.TestFSDPMixedPrecisionSharded) 2022-05-18T04:55:18.3193437Z End to end test to ensure mixed precision + auto_wrap works ... skip: no torchvision (0.002s) 2022-05-18T04:55:18.3344016Z test_mp_batchnorm_convert_sync_bn_False (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80975 2022-05-18T04:55:18.3451769Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80976 2022-05-18T04:55:20.8891335Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:20.8941683Z dist init r=0, world=2 2022-05-18T04:55:20.8947654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:20.9166880Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:20.9219710Z dist init r=1, world=2 2022-05-18T04:55:20.9224717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:20.9225576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:20.9253889Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:21.9697412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:21.9697922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:23.1572698Z ok (4.838s) 2022-05-18T04:55:23.1721929Z test_mp_batchnorm_convert_sync_bn_True (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81059 2022-05-18T04:55:23.1830187Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81060 2022-05-18T04:55:25.7143110Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:25.7194299Z dist init r=0, world=2 2022-05-18T04:55:25.7199747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:25.7395933Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:25.7448529Z dist init r=1, world=2 2022-05-18T04:55:25.7453368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:25.7454511Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:25.7506538Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:26.7902564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:26.7903120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:27.3935755Z ok (4.236s) 2022-05-18T04:55:27.4074878Z test_mp_embedding_default (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81143 2022-05-18T04:55:27.4181435Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81144 2022-05-18T04:55:29.9210657Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:29.9267954Z dist init r=1, world=2 2022-05-18T04:55:29.9273380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:29.9365228Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:29.9417446Z dist init r=0, world=2 2022-05-18T04:55:29.9422464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:29.9423397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:29.9478364Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:30.9664257Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:30.9665152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:30.9978431Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:30.9979010Z warnings.warn( 2022-05-18T04:55:31.0013748Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:31.0014310Z warnings.warn( 2022-05-18T04:55:31.7289861Z ok (4.335s) 2022-05-18T04:55:31.7427852Z test_mp_embedding_only_params_and_bufs (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81227 2022-05-18T04:55:31.7537990Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81228 2022-05-18T04:55:34.3393815Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:34.3433191Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:34.3446517Z dist init r=0, world=2 2022-05-18T04:55:34.3451786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:34.3482741Z dist init r=1, world=2 2022-05-18T04:55:34.3486895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:34.3488056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:34.3555371Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:35.3665535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:35.3666093Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:35.3975700Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:35.3976295Z warnings.warn( 2022-05-18T04:55:35.3979899Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:35.3980520Z warnings.warn( 2022-05-18T04:55:36.0648110Z ok (4.336s) 2022-05-18T04:55:36.0784953Z test_mp_embedding_params_and_reduce_diff (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81311 2022-05-18T04:55:36.0893071Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81312 2022-05-18T04:55:38.6421969Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:38.6457550Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:38.6471818Z dist init r=0, world=2 2022-05-18T04:55:38.6476908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:38.6510207Z dist init r=1, world=2 2022-05-18T04:55:38.6515188Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:38.6516447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:38.6580472Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:39.6933185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:39.6934101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:39.7215048Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:39.7215636Z warnings.warn( 2022-05-18T04:55:39.7219701Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:39.7220247Z warnings.warn( 2022-05-18T04:55:40.4001862Z ok (4.335s) 2022-05-18T04:55:40.4140809Z test_mp_embedding_reduce (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81395 2022-05-18T04:55:40.4252520Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81396 2022-05-18T04:55:42.9410543Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:42.9460751Z dist init r=0, world=2 2022-05-18T04:55:42.9466422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:42.9907715Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:42.9960034Z dist init r=1, world=2 2022-05-18T04:55:42.9964780Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:42.9966041Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:42.9976281Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:44.0252387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:44.0254289Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:44.0536331Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:44.0536904Z warnings.warn( 2022-05-18T04:55:44.0570040Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:44.0570607Z warnings.warn( 2022-05-18T04:55:45.3376835Z ok (4.937s) 2022-05-18T04:55:45.3521110Z test_mixed_precision_e2e_full_shard (__main__.TestFSDPMixedPrecisionUnsharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81479 2022-05-18T04:55:47.8787318Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:47.8838981Z dist init r=0, world=1 2022-05-18T04:55:47.8843809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:47.8844704Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:55:47.9340001Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:48.4599289Z ok (3.122s) 2022-05-18T04:55:48.4738165Z test_mixed_precision_no_reshard_after_forward (__main__.TestFSDPMixedPrecisionUnsharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81521 2022-05-18T04:55:50.9913845Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:50.9965245Z dist init r=0, world=1 2022-05-18T04:55:50.9970311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:50.9971246Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:55:51.0471297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:51.5817357Z ok (3.122s) 2022-05-18T04:55:51.5817696Z 2022-05-18T04:55:51.5818388Z ---------------------------------------------------------------------- 2022-05-18T04:55:51.5819008Z Ran 74 tests in 313.699s 2022-05-18T04:55:51.5820388Z 2022-05-18T04:55:51.5820864Z OK (skipped=1) 2022-05-18T04:55:51.5821068Z 2022-05-18T04:55:51.5821204Z Generating XML reports... 2022-05-18T04:55:51.5928562Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionSharded-20220518045037.xml 2022-05-18T04:55:51.5932171Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionUnsharded-20220518045037.xml 2022-05-18T04:55:51.8713552Z Running distributed/fsdp/test_fsdp_summon_full_params ... [2022-05-18 04:55:51.870829] 2022-05-18T04:55:51.8714343Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_summon_full_params.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:55:51.870935] 2022-05-18T04:55:52.8043696Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params 2022-05-18T04:55:52.8069955Z 2022-05-18T04:55:52.8070391Z Running tests... 2022-05-18T04:55:52.8070892Z ---------------------------------------------------------------------- 2022-05-18T04:55:54.3817003Z test_cannot_summon_full_params_from_backward (__main__.TestSummonFullParams) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:54.4219711Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81598 2022-05-18T04:55:54.4331267Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81599 2022-05-18T04:55:55.3730350Z dist init r=0, world=2 2022-05-18T04:55:55.3734065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:55.3852232Z dist init r=1, world=2 2022-05-18T04:55:55.3857226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:55.3858195Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:55.3939247Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:56.7355825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:56.7356378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:56.7565066Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:56.7565724Z warnings.warn( 2022-05-18T04:55:56.7566473Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:56.7567014Z warnings.warn( 2022-05-18T04:55:56.9306276Z Asserting FSDP instance is: FullyShardedDataParallel( 2022-05-18T04:55:56.9306722Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-05-18T04:55:56.9307107Z (_fpw_module): Linear(in_features=2, out_features=1, bias=True) 2022-05-18T04:55:56.9307406Z ) 2022-05-18T04:55:56.9307616Z ) 2022-05-18T04:55:56.9310664Z ERROR: expected to be in states [] but current state is TrainingState_.BACKWARD_PRE 2022-05-18T04:55:56.9311236Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py", line 222, in bad_backwards_hook 2022-05-18T04:55:56.9311955Z with model.summon_full_params(model): 2022-05-18T04:55:56.9312320Z File "/opt/conda/lib/python3.9/contextlib.py", line 119, in __enter__ 2022-05-18T04:55:56.9312636Z return next(self.gen) 2022-05-18T04:55:56.9313322Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2490, in summon_full_params 2022-05-18T04:55:56.9313739Z stack.enter_context( 2022-05-18T04:55:56.9314081Z File "/opt/conda/lib/python3.9/contextlib.py", line 448, in enter_context 2022-05-18T04:55:56.9314408Z result = _cm_type.__enter__(cm) 2022-05-18T04:55:56.9314765Z File "/opt/conda/lib/python3.9/contextlib.py", line 119, in __enter__ 2022-05-18T04:55:56.9315090Z return next(self.gen) 2022-05-18T04:55:56.9315629Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2339, in _summon_full_params 2022-05-18T04:55:56.9316044Z stack.enter_context( 2022-05-18T04:55:56.9316390Z File "/opt/conda/lib/python3.9/contextlib.py", line 448, in enter_context 2022-05-18T04:55:56.9316728Z result = _cm_type.__enter__(cm) 2022-05-18T04:55:56.9317072Z File "/opt/conda/lib/python3.9/contextlib.py", line 119, in __enter__ 2022-05-18T04:55:56.9317394Z return next(self.gen) 2022-05-18T04:55:56.9317947Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2354, in _summon_full_params 2022-05-18T04:55:56.9318379Z self._assert_state([TrainingState_.IDLE]) 2022-05-18T04:55:56.9318955Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 3298, in _assert_state 2022-05-18T04:55:56.9319365Z traceback.print_stack() 2022-05-18T04:55:57.2409955Z ok (4.434s) 2022-05-18T04:55:57.2550100Z test_cannot_summon_full_params_from_forward (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81681 2022-05-18T04:55:57.2657440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81682 2022-05-18T04:55:58.1823924Z dist init r=0, world=2 2022-05-18T04:55:58.1828021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:58.1841643Z dist init r=1, world=2 2022-05-18T04:55:58.1846884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:58.1847850Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:58.1931497Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:59.5178210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:59.5190654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:59.5191905Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:59.5192508Z warnings.warn( 2022-05-18T04:55:59.5193268Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:55:59.5193792Z warnings.warn( 2022-05-18T04:55:59.5229769Z Asserting FSDP instance is: FullyShardedDataParallel( 2022-05-18T04:55:59.5230346Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-05-18T04:55:59.5230719Z (_fpw_module): MyModule() 2022-05-18T04:55:59.5230954Z ) 2022-05-18T04:55:59.5231164Z ) 2022-05-18T04:55:59.5231520Z ERROR: expected to be in states [] but current state is TrainingState_.FORWARD 2022-05-18T04:55:59.5241693Z File "", line 1, in 2022-05-18T04:55:59.5242253Z File "/opt/conda/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main 2022-05-18T04:55:59.5242817Z exitcode = _main(fd, parent_sentinel) 2022-05-18T04:55:59.5243198Z File "/opt/conda/lib/python3.9/multiprocessing/spawn.py", line 129, in _main 2022-05-18T04:55:59.5243555Z return self._bootstrap(parent_sentinel) 2022-05-18T04:55:59.5243949Z File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap 2022-05-18T04:55:59.5244307Z self.run() 2022-05-18T04:55:59.5244636Z File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run 2022-05-18T04:55:59.5245012Z self._target(*self._args, **self._kwargs) 2022-05-18T04:55:59.5245569Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_fsdp.py", line 429, in _run 2022-05-18T04:55:59.5245941Z self.run_test(test_name, pipe) 2022-05-18T04:55:59.5246485Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 618, in run_test 2022-05-18T04:55:59.5247117Z getattr(self, test_name)() 2022-05-18T04:55:59.5247646Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 499, in wrapper 2022-05-18T04:55:59.5248000Z fn() 2022-05-18T04:55:59.5248489Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 141, in wrapper 2022-05-18T04:55:59.5248884Z return func(*args, **kwargs) 2022-05-18T04:55:59.5249317Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py", line 213, in test_cannot_summon_full_params_from_forward 2022-05-18T04:55:59.5249732Z model(model) 2022-05-18T04:55:59.5250201Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl 2022-05-18T04:55:59.5250595Z return forward_call(*input, **kwargs) 2022-05-18T04:55:59.5251155Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2246, in forward 2022-05-18T04:55:59.5251595Z outputs = self.module(*args, **kwargs) 2022-05-18T04:55:59.5252096Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl 2022-05-18T04:55:59.5252491Z return forward_call(*input, **kwargs) 2022-05-18T04:55:59.5253014Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/flatten_params_wrapper.py", line 476, in forward 2022-05-18T04:55:59.5253441Z return self.module(*inputs, **kwinputs) 2022-05-18T04:55:59.5254081Z File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl 2022-05-18T04:55:59.5254474Z return forward_call(*input, **kwargs) 2022-05-18T04:55:59.5254888Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py", line 206, in forward 2022-05-18T04:55:59.5255317Z with fsdp_module.summon_full_params(fsdp_module): 2022-05-18T04:55:59.5255688Z File "/opt/conda/lib/python3.9/contextlib.py", line 119, in __enter__ 2022-05-18T04:55:59.5256020Z return next(self.gen) 2022-05-18T04:55:59.5256582Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2490, in summon_full_params 2022-05-18T04:55:59.5256997Z stack.enter_context( 2022-05-18T04:55:59.5257321Z File "/opt/conda/lib/python3.9/contextlib.py", line 448, in enter_context 2022-05-18T04:55:59.5257666Z result = _cm_type.__enter__(cm) 2022-05-18T04:55:59.5258011Z File "/opt/conda/lib/python3.9/contextlib.py", line 119, in __enter__ 2022-05-18T04:55:59.5258323Z return next(self.gen) 2022-05-18T04:55:59.5258881Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2339, in _summon_full_params 2022-05-18T04:55:59.5259296Z stack.enter_context( 2022-05-18T04:55:59.5259617Z File "/opt/conda/lib/python3.9/contextlib.py", line 448, in enter_context 2022-05-18T04:55:59.5260041Z result = _cm_type.__enter__(cm) 2022-05-18T04:55:59.5260392Z File "/opt/conda/lib/python3.9/contextlib.py", line 119, in __enter__ 2022-05-18T04:55:59.5260721Z return next(self.gen) 2022-05-18T04:55:59.5261256Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2354, in _summon_full_params 2022-05-18T04:55:59.5261704Z self._assert_state([TrainingState_.IDLE]) 2022-05-18T04:55:59.5262281Z File "/opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 3298, in _assert_state 2022-05-18T04:55:59.5262681Z traceback.print_stack() 2022-05-18T04:55:59.7725424Z ok (2.531s) 2022-05-18T04:55:59.7873244Z test_named_parameters_buffers_prefix__recurse_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81760 2022-05-18T04:55:59.7979899Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81761 2022-05-18T04:56:00.7215845Z dist init r=0, world=2 2022-05-18T04:56:00.7219579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:00.7233149Z dist init r=1, world=2 2022-05-18T04:56:00.7238569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:00.7239852Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:00.7322894Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:02.0848350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:02.0848877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:02.4051121Z ok (2.632s) 2022-05-18T04:56:02.4202673Z test_named_parameters_buffers_prefix__recurse_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81839 2022-05-18T04:56:02.4315885Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81840 2022-05-18T04:56:03.3412493Z dist init r=0, world=2 2022-05-18T04:56:03.3415763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:03.3478259Z dist init r=1, world=2 2022-05-18T04:56:03.3483416Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:03.3484997Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:03.3519071Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:04.6978182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:04.6978748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:05.0386738Z ok (2.633s) 2022-05-18T04:56:05.0537802Z test_named_parameters_buffers_prefix_test_prefix_recurse_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81918 2022-05-18T04:56:05.0645793Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81919 2022-05-18T04:56:05.9619987Z dist init r=0, world=2 2022-05-18T04:56:05.9623542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:06.0169963Z dist init r=1, world=2 2022-05-18T04:56:06.0174714Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:06.0175867Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:06.0234243Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:07.3482912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:07.3483438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:07.6723436Z ok (2.633s) 2022-05-18T04:56:07.6867159Z test_named_parameters_buffers_prefix_test_prefix_recurse_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81997 2022-05-18T04:56:07.6974431Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81998 2022-05-18T04:56:08.6346553Z dist init r=1, world=2 2022-05-18T04:56:08.6350477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:08.6494063Z dist init r=0, world=2 2022-05-18T04:56:08.6499048Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:08.6500250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:08.6555634Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:09.9812398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:09.9812949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:10.3043900Z ok (2.632s) 2022-05-18T04:56:10.3196708Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82076 2022-05-18T04:56:10.3305780Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82077 2022-05-18T04:56:11.2652778Z dist init r=0, world=2 2022-05-18T04:56:11.2656695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:11.2939465Z dist init r=1, world=2 2022-05-18T04:56:11.2944769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:11.2945867Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:11.2963013Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:12.6354935Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:12.6355498Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:12.9377119Z ok (2.633s) 2022-05-18T04:56:12.9527052Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82155 2022-05-18T04:56:12.9634604Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82156 2022-05-18T04:56:13.8861962Z dist init r=0, world=2 2022-05-18T04:56:13.8865875Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:13.9375846Z dist init r=1, world=2 2022-05-18T04:56:13.9380680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:13.9381984Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:13.9477885Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:15.2938668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:15.2939519Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:15.5703258Z ok (2.633s) 2022-05-18T04:56:15.5854032Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82234 2022-05-18T04:56:15.5962496Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82235 2022-05-18T04:56:16.5525554Z dist init r=0, world=2 2022-05-18T04:56:16.5529103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:16.5664432Z dist init r=1, world=2 2022-05-18T04:56:16.5669518Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:16.5670675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:16.5734075Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:17.9095217Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:17.9095748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:17.9306196Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:56:17.9307035Z warnings.warn( 2022-05-18T04:56:17.9308022Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:56:17.9308704Z warnings.warn( 2022-05-18T04:56:18.2032879Z ok (2.633s) 2022-05-18T04:56:18.2182174Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82313 2022-05-18T04:56:18.2292367Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82314 2022-05-18T04:56:19.1413119Z dist init r=1, world=2 2022-05-18T04:56:19.1416684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:19.1462102Z dist init r=0, world=2 2022-05-18T04:56:19.1467473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:19.1468593Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:19.1519902Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:20.4959141Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:20.4959703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:20.5185292Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:56:20.5186008Z warnings.warn( 2022-05-18T04:56:20.5186992Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:56:20.5187962Z warnings.warn( 2022-05-18T04:56:20.8361688Z ok (2.633s) 2022-05-18T04:56:20.8511973Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82392 2022-05-18T04:56:20.8619736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82393 2022-05-18T04:56:21.7657333Z dist init r=1, world=2 2022-05-18T04:56:21.7661375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:21.8312850Z dist init r=0, world=2 2022-05-18T04:56:21.8318143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:21.8318981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:21.8374292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:23.1893844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:23.1894348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:23.4689056Z ok (2.633s) 2022-05-18T04:56:23.4840323Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82471 2022-05-18T04:56:23.4947722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82472 2022-05-18T04:56:24.4762145Z dist init r=1, world=2 2022-05-18T04:56:24.4765785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:24.4768227Z dist init r=0, world=2 2022-05-18T04:56:24.4773060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:24.4774339Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:24.4869627Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:25.8142830Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:25.8143390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:26.1017998Z ok (2.633s) 2022-05-18T04:56:26.1169904Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82550 2022-05-18T04:56:26.1279829Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82551 2022-05-18T04:56:27.0651505Z dist init r=0, world=2 2022-05-18T04:56:27.0655236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:27.0862425Z dist init r=1, world=2 2022-05-18T04:56:27.0867543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:27.0868974Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:27.0962396Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:28.4192890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:28.4193753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:28.7350420Z ok (2.633s) 2022-05-18T04:56:28.7505230Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82629 2022-05-18T04:56:28.7613620Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82630 2022-05-18T04:56:29.6572643Z dist init r=0, world=2 2022-05-18T04:56:29.6576257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:29.6659140Z dist init r=1, world=2 2022-05-18T04:56:29.6664031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:29.6665272Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:29.6679012Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:31.0218603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:31.0219116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:31.3683241Z ok (2.633s) 2022-05-18T04:56:31.3834247Z test_params_count_and_value_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82708 2022-05-18T04:56:31.3941942Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82709 2022-05-18T04:56:32.3404706Z dist init r=0, world=2 2022-05-18T04:56:32.3408598Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:32.3547679Z dist init r=1, world=2 2022-05-18T04:56:32.3552863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:32.3553815Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:32.3613647Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:33.6972952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:33.6973786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:34.0012188Z ok (2.633s) 2022-05-18T04:56:34.0161533Z test_params_count_and_value_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82787 2022-05-18T04:56:34.0269765Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82788 2022-05-18T04:56:34.9395521Z dist init r=1, world=2 2022-05-18T04:56:34.9398866Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:34.9924145Z dist init r=0, world=2 2022-05-18T04:56:34.9929462Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:34.9930947Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:35.0011292Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:36.3303950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:36.3304656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:36.6338699Z ok (2.632s) 2022-05-18T04:56:36.6487645Z test_params_count_and_value_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82866 2022-05-18T04:56:36.6594978Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82867 2022-05-18T04:56:37.5535641Z dist init r=0, world=2 2022-05-18T04:56:37.5539183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:37.5632391Z dist init r=1, world=2 2022-05-18T04:56:37.5637600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:37.5638902Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:37.5642062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:38.9068859Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:38.9069415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:39.2663831Z ok (2.632s) 2022-05-18T04:56:39.2813157Z test_params_count_and_value_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82945 2022-05-18T04:56:39.2921130Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82946 2022-05-18T04:56:40.1891270Z dist init r=1, world=2 2022-05-18T04:56:40.1895207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:40.2043663Z dist init r=0, world=2 2022-05-18T04:56:40.2048989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:40.2050334Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:40.2100000Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:41.5612021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:41.5612537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:41.8990599Z ok (2.633s) 2022-05-18T04:56:41.9139225Z test_params_count_and_value_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83024 2022-05-18T04:56:41.9246734Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83025 2022-05-18T04:56:42.8210137Z dist init r=0, world=2 2022-05-18T04:56:42.8214789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:42.8636285Z dist init r=1, world=2 2022-05-18T04:56:42.8640739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:42.8641769Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:42.8724031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:44.1926542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:44.1927112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:44.5316482Z ok (2.632s) 2022-05-18T04:56:44.5466277Z test_params_count_and_value_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83103 2022-05-18T04:56:44.5573501Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83104 2022-05-18T04:56:45.4726566Z dist init r=1, world=2 2022-05-18T04:56:45.4729917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:45.4771481Z dist init r=0, world=2 2022-05-18T04:56:45.4776784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:45.4778146Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:45.4833303Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:46.8370363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:46.8370900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:47.1644176Z ok (2.633s) 2022-05-18T04:56:47.1794407Z test_params_count_and_value_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83182 2022-05-18T04:56:47.1904372Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83183 2022-05-18T04:56:48.1342352Z dist init r=0, world=2 2022-05-18T04:56:48.1346528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:48.1490044Z dist init r=1, world=2 2022-05-18T04:56:48.1494991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:48.1496451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:48.1552130Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:49.4819058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:49.4819601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:49.7973359Z ok (2.633s) 2022-05-18T04:56:49.8121865Z test_params_count_and_value_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83261 2022-05-18T04:56:49.8228721Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83262 2022-05-18T04:56:50.7389241Z dist init r=0, world=2 2022-05-18T04:56:50.7393354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:50.7740755Z dist init r=1, world=2 2022-05-18T04:56:50.7745552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:50.7746915Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:50.7801604Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:52.1159452Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:52.1159962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:52.4298553Z ok (2.632s) 2022-05-18T04:56:52.4445459Z test_raises_rank0_with_writeback (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83340 2022-05-18T04:56:52.4553086Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83341 2022-05-18T04:56:53.4405871Z dist init r=0, world=2 2022-05-18T04:56:53.4409749Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:53.4410150Z dist init r=1, world=2 2022-05-18T04:56:53.4414613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:53.4415757Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:53.4514019Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:54.7957425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:54.7957947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:55.0623460Z ok (2.632s) 2022-05-18T04:56:55.0789026Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83419 2022-05-18T04:56:55.0895705Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83420 2022-05-18T04:56:56.0294119Z dist init r=0, world=2 2022-05-18T04:56:56.0297579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:56.0311000Z dist init r=1, world=2 2022-05-18T04:56:56.0315978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:56.0316892Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:56.0400924Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:57.3732636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:57.3733162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:57.3924797Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:56:57.3925380Z warnings.warn( 2022-05-18T04:56:57.3926142Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:56:57.3926686Z warnings.warn( 2022-05-18T04:56:57.8970946Z ok (2.835s) 2022-05-18T04:56:57.9131463Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83502 2022-05-18T04:56:57.9237723Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83503 2022-05-18T04:56:58.8787943Z dist init r=1, world=2 2022-05-18T04:56:58.8788264Z dist init r=0, world=2 2022-05-18T04:56:58.8792105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:56:58.8792633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:56:58.8793432Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:56:58.8794132Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:00.2221081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:00.2221624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:00.2444577Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:00.2445372Z warnings.warn( 2022-05-18T04:57:00.2446158Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:00.2446702Z warnings.warn( 2022-05-18T04:57:00.7311226Z ok (2.834s) 2022-05-18T04:57:00.7465029Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83585 2022-05-18T04:57:00.7572024Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83586 2022-05-18T04:57:01.6707374Z dist init r=1, world=2 2022-05-18T04:57:01.6711205Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:01.6788951Z dist init r=0, world=2 2022-05-18T04:57:01.6794113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:01.6795270Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:01.6814346Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:03.0495989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:03.0496515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:03.0684424Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:03.0685019Z warnings.warn( 2022-05-18T04:57:03.0685797Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:03.0686336Z warnings.warn( 2022-05-18T04:57:03.2481422Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:57:03.2482144Z warnings.warn( 2022-05-18T04:57:03.2485911Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:57:03.2486606Z warnings.warn( 2022-05-18T04:57:03.5646087Z ok (2.833s) 2022-05-18T04:57:03.5804432Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83668 2022-05-18T04:57:03.5912913Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83669 2022-05-18T04:57:04.5081603Z dist init r=1, world=2 2022-05-18T04:57:04.5085230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:04.5124558Z dist init r=0, world=2 2022-05-18T04:57:04.5129864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:04.5130951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:04.5188038Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:05.8720697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:05.8721220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:05.8924386Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:05.8924977Z warnings.warn( 2022-05-18T04:57:05.8960975Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:05.8961524Z warnings.warn( 2022-05-18T04:57:06.0729562Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:57:06.0730298Z warnings.warn( 2022-05-18T04:57:06.0731851Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:57:06.0732527Z warnings.warn( 2022-05-18T04:57:06.3986909Z ok (2.834s) 2022-05-18T04:57:06.4143060Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83751 2022-05-18T04:57:06.4250413Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83752 2022-05-18T04:57:07.3413608Z dist init r=1, world=2 2022-05-18T04:57:07.3417334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:07.3997439Z dist init r=0, world=2 2022-05-18T04:57:07.4002869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:07.4003758Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:07.4027602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:08.7753857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:08.7754388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:08.7964139Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:08.7964734Z warnings.warn( 2022-05-18T04:57:08.7965518Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:08.7966052Z warnings.warn( 2022-05-18T04:57:09.2324550Z ok (2.834s) 2022-05-18T04:57:09.2480447Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83834 2022-05-18T04:57:09.2587398Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83835 2022-05-18T04:57:10.1818482Z dist init r=0, world=2 2022-05-18T04:57:10.1822088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:10.2297016Z dist init r=1, world=2 2022-05-18T04:57:10.2301930Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:10.2303100Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:10.2332122Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:11.5885216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:11.5885773Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:11.6084677Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:11.6085246Z warnings.warn( 2022-05-18T04:57:11.6086016Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:11.6086549Z warnings.warn( 2022-05-18T04:57:12.0661738Z ok (2.834s) 2022-05-18T04:57:12.0818378Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83917 2022-05-18T04:57:12.0927968Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83918 2022-05-18T04:57:13.0822687Z dist init r=0, world=2 2022-05-18T04:57:13.0822991Z dist init r=1, world=2 2022-05-18T04:57:13.0826508Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:13.0827347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:13.0828423Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:13.0829168Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:14.4410506Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:14.4411052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:14.4604211Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:14.4604765Z warnings.warn( 2022-05-18T04:57:14.4605546Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:14.4606085Z warnings.warn( 2022-05-18T04:57:14.9001037Z ok (2.834s) 2022-05-18T04:57:14.9157251Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84000 2022-05-18T04:57:14.9264054Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84001 2022-05-18T04:57:15.8393964Z dist init r=1, world=2 2022-05-18T04:57:15.8397354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:15.8428208Z dist init r=0, world=2 2022-05-18T04:57:15.8433275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:15.8434221Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:15.8500651Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:17.2134551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:17.2135104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:17.2364823Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:17.2365387Z warnings.warn( 2022-05-18T04:57:17.2366151Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:17.2366692Z warnings.warn( 2022-05-18T04:57:17.7338802Z ok (2.834s) 2022-05-18T04:57:17.7481090Z test_summon_from_non_fsdp (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84083 2022-05-18T04:57:17.7590819Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84084 2022-05-18T04:57:18.7186663Z dist init r=1, world=2 2022-05-18T04:57:18.7191030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:18.7301406Z dist init r=0, world=2 2022-05-18T04:57:18.7306737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:18.7307837Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:18.7395936Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:20.0678333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:20.0678913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:20.3659684Z ok (2.632s) 2022-05-18T04:57:20.3809672Z test_summon_full_param_recursive_recurse_False_summon_outer_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84162 2022-05-18T04:57:20.3918288Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84163 2022-05-18T04:57:21.3428695Z dist init r=1, world=2 2022-05-18T04:57:21.3432112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:21.3646963Z dist init r=0, world=2 2022-05-18T04:57:21.3651688Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:21.3652591Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:21.3739197Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:22.7006277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:22.7006815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:22.7204100Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:22.7204656Z warnings.warn( 2022-05-18T04:57:22.7205438Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:22.7205973Z warnings.warn( 2022-05-18T04:57:22.9988370Z ok (2.633s) 2022-05-18T04:57:23.0137501Z test_summon_full_param_recursive_recurse_False_summon_outer_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84241 2022-05-18T04:57:23.0244931Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84242 2022-05-18T04:57:23.9328588Z dist init r=1, world=2 2022-05-18T04:57:23.9332419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:23.9611356Z dist init r=0, world=2 2022-05-18T04:57:23.9616169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:23.9616974Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:23.9638716Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:25.3009383Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:25.3010387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:25.3205095Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:25.3206287Z warnings.warn( 2022-05-18T04:57:25.3207778Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:25.3208875Z warnings.warn( 2022-05-18T04:57:25.6314271Z ok (2.632s) 2022-05-18T04:57:25.6463002Z test_summon_full_param_recursive_recurse_False_summon_outer_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84320 2022-05-18T04:57:25.6570109Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84321 2022-05-18T04:57:26.5653333Z dist init r=0, world=2 2022-05-18T04:57:26.5656898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:26.5820466Z dist init r=1, world=2 2022-05-18T04:57:26.5825183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:26.5826596Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:26.5861917Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:27.9020016Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:27.9020541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:27.9244283Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:27.9245191Z warnings.warn( 2022-05-18T04:57:27.9245956Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:27.9246491Z warnings.warn( 2022-05-18T04:57:28.2639768Z ok (2.632s) 2022-05-18T04:57:28.2789311Z test_summon_full_param_recursive_recurse_False_summon_outer_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84399 2022-05-18T04:57:28.2896713Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84400 2022-05-18T04:57:29.2058265Z dist init r=1, world=2 2022-05-18T04:57:29.2062080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:29.2477059Z dist init r=0, world=2 2022-05-18T04:57:29.2481699Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:29.2482682Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:29.2572629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:30.5913948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:30.5914495Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:30.6124577Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:30.6125172Z warnings.warn( 2022-05-18T04:57:30.6125936Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:30.6126473Z warnings.warn( 2022-05-18T04:57:30.8966166Z ok (2.632s) 2022-05-18T04:57:30.9115526Z test_summon_full_param_recursive_recurse_True_summon_outer_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84478 2022-05-18T04:57:30.9224593Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84479 2022-05-18T04:57:31.8698907Z dist init r=1, world=2 2022-05-18T04:57:31.8702707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:31.8910411Z dist init r=0, world=2 2022-05-18T04:57:31.8915469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:31.8916838Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:31.9009707Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:33.2236691Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:33.2237231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:33.2484427Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:33.2485005Z warnings.warn( 2022-05-18T04:57:33.2485745Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:33.2486490Z warnings.warn( 2022-05-18T04:57:33.5293347Z ok (2.633s) 2022-05-18T04:57:33.5442207Z test_summon_full_param_recursive_recurse_True_summon_outer_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84557 2022-05-18T04:57:33.5550256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84558 2022-05-18T04:57:34.4716115Z dist init r=0, world=2 2022-05-18T04:57:34.4720100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:34.5122131Z dist init r=1, world=2 2022-05-18T04:57:34.5127599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:34.5128604Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:34.5129330Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:35.8584475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:35.8585028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:35.8804442Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:35.8805041Z warnings.warn( 2022-05-18T04:57:35.8805809Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:35.8806360Z warnings.warn( 2022-05-18T04:57:36.1619888Z ok (2.632s) 2022-05-18T04:57:36.1769925Z test_summon_full_param_recursive_recurse_True_summon_outer_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84636 2022-05-18T04:57:36.1877498Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84637 2022-05-18T04:57:37.0956356Z dist init r=0, world=2 2022-05-18T04:57:37.0960680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:37.1544578Z dist init r=1, world=2 2022-05-18T04:57:37.1549250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:37.1550473Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:37.1572367Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:38.5346726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:38.5347229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:38.5564282Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:38.5564874Z warnings.warn( 2022-05-18T04:57:38.5565640Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:38.5566175Z warnings.warn( 2022-05-18T04:57:38.7949102Z ok (2.633s) 2022-05-18T04:57:38.8096947Z test_summon_full_param_recursive_recurse_True_summon_outer_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84715 2022-05-18T04:57:38.8204589Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84716 2022-05-18T04:57:39.7439795Z dist init r=1, world=2 2022-05-18T04:57:39.7441108Z dist init r=0, world=2 2022-05-18T04:57:39.7444196Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:39.7445767Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:39.7446366Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:39.7447050Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:41.0930292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:41.0930819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:41.1124706Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:41.1125260Z warnings.warn( 2022-05-18T04:57:41.1160352Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:57:41.1160895Z warnings.warn( 2022-05-18T04:57:41.4273673Z ok (2.632s) 2022-05-18T04:57:41.4420416Z test_summon_full_param_shard_value_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84794 2022-05-18T04:57:41.4528030Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84795 2022-05-18T04:57:42.3403961Z dist init r=0, world=2 2022-05-18T04:57:42.3407445Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:42.3897970Z dist init r=1, world=2 2022-05-18T04:57:42.3903183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:42.3904524Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:42.3917075Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:43.7124433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:43.7124997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:44.0597195Z ok (2.632s) 2022-05-18T04:57:44.0743835Z test_summon_full_param_shard_value_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84873 2022-05-18T04:57:44.0851604Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84874 2022-05-18T04:57:45.0027772Z dist init r=0, world=2 2022-05-18T04:57:45.0032088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:45.0435229Z dist init r=1, world=2 2022-05-18T04:57:45.0440171Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:45.0440990Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:45.0441691Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:46.3996429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:46.3996960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:46.6921519Z ok (2.632s) 2022-05-18T04:57:46.7061819Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84952 2022-05-18T04:57:46.7171538Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84953 2022-05-18T04:57:47.6321329Z dist init r=0, world=2 2022-05-18T04:57:47.6324912Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:47.6797237Z dist init r=1, world=2 2022-05-18T04:57:47.6802480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:47.6803417Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:47.6834489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:49.0331619Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:49.3240897Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:49.3241279Z ok (2.632s) 2022-05-18T04:57:49.3381653Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85031 2022-05-18T04:57:49.3490872Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85032 2022-05-18T04:57:50.2635691Z dist init r=1, world=2 2022-05-18T04:57:50.2639792Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:50.2839570Z dist init r=0, world=2 2022-05-18T04:57:50.2844661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:50.2845436Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:50.2846400Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:51.6267334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:51.6267877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:51.9561460Z ok (2.632s) 2022-05-18T04:57:51.9701657Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85110 2022-05-18T04:57:51.9809617Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85111 2022-05-18T04:57:52.8946753Z dist init r=0, world=2 2022-05-18T04:57:52.8950826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:52.9366662Z dist init r=1, world=2 2022-05-18T04:57:52.9371648Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:52.9372451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:52.9461097Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:54.2865188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:54.2865722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:54.5879191Z ok (2.632s) 2022-05-18T04:57:54.6018235Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85189 2022-05-18T04:57:54.6127284Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85190 2022-05-18T04:57:55.5384037Z dist init r=0, world=2 2022-05-18T04:57:55.5387753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:55.5766195Z dist init r=1, world=2 2022-05-18T04:57:55.5772036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:55.5772868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:55.5795742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:56.8991109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:56.8991642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:57.2197192Z ok (2.632s) 2022-05-18T04:57:57.2338302Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85268 2022-05-18T04:57:57.2450459Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85269 2022-05-18T04:57:58.1758572Z dist init r=1, world=2 2022-05-18T04:57:58.1762861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:58.2031660Z dist init r=0, world=2 2022-05-18T04:57:58.2036085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:58.2036976Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:58.2069668Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:59.5364911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:59.5365424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:59.8518413Z ok (2.632s) 2022-05-18T04:57:59.8659700Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85347 2022-05-18T04:57:59.8768224Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85348 2022-05-18T04:58:00.7928779Z dist init r=1, world=2 2022-05-18T04:58:00.7932612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:00.8441670Z dist init r=0, world=2 2022-05-18T04:58:00.8447676Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:00.8448491Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:00.8544300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:02.1981990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:02.1982867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:02.4838088Z ok (2.632s) 2022-05-18T04:58:02.4979401Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85426 2022-05-18T04:58:02.5088975Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85427 2022-05-18T04:58:03.4418178Z dist init r=0, world=2 2022-05-18T04:58:03.4421982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:03.4717208Z dist init r=1, world=2 2022-05-18T04:58:03.4722445Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:03.4723249Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:03.4728754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:04.8124629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:05.1159172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:05.1159571Z ok (2.632s) 2022-05-18T04:58:05.1299325Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85505 2022-05-18T04:58:05.1408387Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85506 2022-05-18T04:58:06.0384665Z dist init r=0, world=2 2022-05-18T04:58:06.0388268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:06.0549204Z dist init r=1, world=2 2022-05-18T04:58:06.0554955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:06.0571068Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:06.0593678Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:07.3997862Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:07.3998925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:07.7478467Z ok (2.632s) 2022-05-18T04:58:07.7618808Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85584 2022-05-18T04:58:07.7729439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85585 2022-05-18T04:58:08.7186286Z dist init r=1, world=2 2022-05-18T04:58:08.7190594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:08.7422339Z dist init r=0, world=2 2022-05-18T04:58:08.7427864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:08.7428985Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:08.7497643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:10.0949207Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:10.0950061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:10.4801088Z ok (2.732s) 2022-05-18T04:58:10.4941868Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85663 2022-05-18T04:58:10.5049930Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85664 2022-05-18T04:58:11.4211619Z dist init r=1, world=2 2022-05-18T04:58:11.4214848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:11.4333985Z dist init r=0, world=2 2022-05-18T04:58:11.4338981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:11.4340096Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:11.4419898Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:12.7803957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:12.7804489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:13.1120507Z ok (2.632s) 2022-05-18T04:58:13.1260700Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85742 2022-05-18T04:58:13.1368302Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85743 2022-05-18T04:58:14.0397492Z dist init r=0, world=2 2022-05-18T04:58:14.0400837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:14.0806791Z dist init r=1, world=2 2022-05-18T04:58:14.0811633Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:14.0812933Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:14.0911238Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:15.4140253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:15.4141110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:15.7438662Z ok (2.632s) 2022-05-18T04:58:15.7578704Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85821 2022-05-18T04:58:15.7686576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85822 2022-05-18T04:58:16.6886677Z dist init r=1, world=2 2022-05-18T04:58:16.6890109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:16.6908821Z dist init r=0, world=2 2022-05-18T04:58:16.6914678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:16.6916109Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:16.6993810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:18.0307569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:18.0308642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:18.3757101Z ok (2.632s) 2022-05-18T04:58:18.3897899Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85900 2022-05-18T04:58:18.4007276Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85901 2022-05-18T04:58:19.3188559Z dist init r=0, world=2 2022-05-18T04:58:19.3192314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:19.3520971Z dist init r=1, world=2 2022-05-18T04:58:19.3525837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:19.3526876Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:19.3600780Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:20.7137313Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:20.7137868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:21.0076815Z ok (2.632s) 2022-05-18T04:58:21.0218118Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85979 2022-05-18T04:58:21.0327091Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85980 2022-05-18T04:58:21.9866320Z dist init r=0, world=2 2022-05-18T04:58:21.9870352Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:21.9896984Z dist init r=1, world=2 2022-05-18T04:58:21.9902534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:21.9904214Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:21.9974060Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:23.3268635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:23.3269611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:23.6397358Z ok (2.632s) 2022-05-18T04:58:23.6536949Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86058 2022-05-18T04:58:23.6645808Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86059 2022-05-18T04:58:24.5764976Z dist init r=0, world=2 2022-05-18T04:58:24.5768317Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:24.5798716Z dist init r=1, world=2 2022-05-18T04:58:24.5803696Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:24.5805019Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:24.5871757Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:25.9259475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:26.2715846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:26.2716229Z ok (2.632s) 2022-05-18T04:58:26.2855759Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86137 2022-05-18T04:58:26.2962980Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86138 2022-05-18T04:58:27.2447201Z dist init r=1, world=2 2022-05-18T04:58:27.2451177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:27.2474600Z dist init r=0, world=2 2022-05-18T04:58:27.2479661Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:27.2480903Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:27.2554552Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:28.6115971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:28.6116514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:28.9033490Z ok (2.632s) 2022-05-18T04:58:28.9187900Z test_summon_full_params_equivalence_rank0_only_False_offload_to_cpu_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86216 2022-05-18T04:58:28.9298442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86217 2022-05-18T04:58:29.8505550Z dist init r=1, world=2 2022-05-18T04:58:29.8509764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:29.8768438Z dist init r=0, world=2 2022-05-18T04:58:29.8773249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:29.8774326Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:29.8817079Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:31.2045732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:31.2046267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:31.5367760Z ok (2.633s) 2022-05-18T04:58:31.5523496Z test_summon_full_params_equivalence_rank0_only_False_offload_to_cpu_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86295 2022-05-18T04:58:31.5632289Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86296 2022-05-18T04:58:32.4695343Z dist init r=0, world=2 2022-05-18T04:58:32.4698943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:32.4748210Z dist init r=1, world=2 2022-05-18T04:58:32.4753755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:32.4754557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:32.4802373Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:33.8094752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:33.8095299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:33.8324255Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:58:33.8325443Z warnings.warn( 2022-05-18T04:58:33.8360562Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2310: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T04:58:33.8361257Z warnings.warn( 2022-05-18T04:58:34.0700740Z ok (2.533s) 2022-05-18T04:58:34.0844475Z test_summon_full_params_equivalence_rank0_only_True_offload_to_cpu_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86374 2022-05-18T04:58:34.0951872Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86375 2022-05-18T04:58:35.0461473Z dist init r=0, world=2 2022-05-18T04:58:35.0464894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:35.0482819Z dist init r=1, world=2 2022-05-18T04:58:35.0488038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:35.0489234Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:35.0568692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:36.4022904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:36.4023431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:36.7022628Z ok (2.632s) 2022-05-18T04:58:36.7167196Z test_summon_full_params_equivalence_rank0_only_True_offload_to_cpu_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86453 2022-05-18T04:58:36.7275685Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86454 2022-05-18T04:58:37.6394856Z dist init r=1, world=2 2022-05-18T04:58:37.6398217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:37.6472882Z dist init r=0, world=2 2022-05-18T04:58:37.6477891Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:37.6478975Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:37.6501346Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:39.0060849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:39.0061494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:39.3345178Z ok (2.632s) 2022-05-18T04:58:39.3491162Z test_summon_full_params_respects_reshard_after_forward_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86532 2022-05-18T04:58:39.3602381Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86533 2022-05-18T04:58:40.3061388Z dist init r=0, world=2 2022-05-18T04:58:40.3065052Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:40.3127051Z dist init r=1, world=2 2022-05-18T04:58:40.3131982Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:40.3133340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:40.3168285Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:41.6533422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:41.6533952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:41.6724293Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:58:41.6724896Z warnings.warn( 2022-05-18T04:58:41.6725736Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:58:41.6726415Z warnings.warn( 2022-05-18T04:58:42.1676280Z ok (2.833s) 2022-05-18T04:58:42.1821480Z test_summon_full_params_respects_reshard_after_forward_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86611 2022-05-18T04:58:42.1930111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86612 2022-05-18T04:58:43.0992974Z dist init r=0, world=2 2022-05-18T04:58:43.0997564Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:43.1078980Z dist init r=1, world=2 2022-05-18T04:58:43.1084106Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:43.1085373Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:43.1099653Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:44.4694122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:44.4694630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:44.4884678Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:58:44.4885594Z warnings.warn( 2022-05-18T04:58:44.4886711Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:58:44.4887532Z warnings.warn( 2022-05-18T04:58:45.0004455Z ok (2.833s) 2022-05-18T04:58:45.0156353Z test_summon_single_param (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86690 2022-05-18T04:58:45.0264563Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86691 2022-05-18T04:58:45.9410338Z dist init r=1, world=2 2022-05-18T04:58:45.9413919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:45.9435438Z dist init r=0, world=2 2022-05-18T04:58:45.9440289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:45.9441516Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:45.9517415Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:47.3003435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:47.3004011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:47.3204914Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:58:47.3205690Z warnings.warn( 2022-05-18T04:58:47.3206469Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:58:47.3207024Z warnings.warn( 2022-05-18T04:58:47.6335346Z ok (2.633s) 2022-05-18T04:58:47.6485594Z test_summon_full_param_writeback_writeback_False_modify_outer_False_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86769 2022-05-18T04:58:48.5670897Z dist init r=0, world=1 2022-05-18T04:58:48.5674752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:48.5675633Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:58:49.8185154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:50.0547032Z ok (2.421s) 2022-05-18T04:58:50.0695224Z test_summon_full_param_writeback_writeback_False_modify_outer_False_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86809 2022-05-18T04:58:50.9919934Z dist init r=0, world=1 2022-05-18T04:58:50.9923543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:50.9924889Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:58:52.2617079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:52.5758079Z ok (2.521s) 2022-05-18T04:58:52.5899516Z test_summon_full_param_writeback_writeback_False_modify_outer_True_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86849 2022-05-18T04:58:53.5131957Z dist init r=0, world=1 2022-05-18T04:58:53.5135457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:53.5136890Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:58:54.7783931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:54.9961522Z ok (2.420s) 2022-05-18T04:58:55.0101551Z test_summon_full_param_writeback_writeback_False_modify_outer_True_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86889 2022-05-18T04:58:55.9288194Z dist init r=0, world=1 2022-05-18T04:58:55.9291576Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:55.9292804Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:58:57.1797589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:57.4162571Z ok (2.420s) 2022-05-18T04:58:57.4303868Z test_summon_full_param_writeback_writeback_True_modify_outer_False_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86929 2022-05-18T04:58:58.3449925Z dist init r=0, world=1 2022-05-18T04:58:58.3453686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:58.3455287Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:58:59.6170763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:59.9366651Z ok (2.520s) 2022-05-18T04:58:59.9507766Z test_summon_full_param_writeback_writeback_True_modify_outer_False_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86969 2022-05-18T04:59:00.8622482Z dist init r=0, world=1 2022-05-18T04:59:00.8626237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:00.8627685Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:59:02.1328075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:02.3570102Z ok (2.420s) 2022-05-18T04:59:02.3712751Z test_summon_full_param_writeback_writeback_True_modify_outer_True_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87009 2022-05-18T04:59:03.2916173Z dist init r=0, world=1 2022-05-18T04:59:03.2919572Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:03.2920656Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:59:04.5367604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:04.7774868Z ok (2.420s) 2022-05-18T04:59:04.7912976Z test_summon_full_param_writeback_writeback_True_modify_outer_True_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87049 2022-05-18T04:59:05.7208019Z dist init r=0, world=1 2022-05-18T04:59:05.7211789Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:05.7212839Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T04:59:06.9870823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:07.2975934Z ok (2.520s) 2022-05-18T04:59:07.2976121Z 2022-05-18T04:59:07.2976514Z ---------------------------------------------------------------------- 2022-05-18T04:59:07.2977067Z Ran 73 tests in 194.491s 2022-05-18T04:59:07.2980082Z 2022-05-18T04:59:07.2981363Z OK 2022-05-18T04:59:07.2981639Z 2022-05-18T04:59:07.2981787Z Generating XML reports... 2022-05-18T04:59:07.3079198Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParams-20220518045552.xml 2022-05-18T04:59:07.3089160Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParamsNoShard-20220518045552.xml 2022-05-18T04:59:07.5799667Z Running distributed/optim/test_zero_redundancy_optimizer ... [2022-05-18 04:59:07.579454] 2022-05-18T04:59:07.5800437Z Executing ['/opt/conda/bin/python', 'distributed/optim/test_zero_redundancy_optimizer.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:59:07.579560] 2022-05-18T04:59:08.7276685Z Test results will be stored in test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer 2022-05-18T04:59:08.7298949Z 2022-05-18T04:59:08.7299233Z Running tests... 2022-05-18T04:59:08.7299663Z ---------------------------------------------------------------------- 2022-05-18T04:59:08.7319493Z test_add_param_group (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:10.3207708Z Check that ZeroRedundancyOptimizer properly handles adding a new ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:10.3354229Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/67287 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (1.605s) 2022-05-18T04:59:10.3371656Z test_collect_shards (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:10.3635479Z Check the state consolidation mechanism and the state dict exposed ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87124 2022-05-18T04:59:10.3748532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87125 2022-05-18T04:59:11.5052953Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:11.5055113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:11.5108639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:11.5112486Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:11.5113306Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:11.5158017Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:13.3830960Z ok (3.047s) 2022-05-18T04:59:13.3846068Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:13.3975374Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87207 2022-05-18T04:59:13.4084681Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87208 2022-05-18T04:59:14.5969780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:14.5971997Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:14.5992670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:14.5996637Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:14.5997705Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:14.6075115Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:15.9896726Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:15.9899132Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:16.1841866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:16.1842422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:16.2267984Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:16.2268507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:16.5167316Z ok (3.133s) 2022-05-18T04:59:16.5182089Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:16.5312445Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87320 2022-05-18T04:59:16.5420744Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87321 2022-05-18T04:59:17.7063419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:17.7065815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:17.7252144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:17.7255889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:17.7257014Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:17.7270698Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:19.1385845Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:19.1397079Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:19.3356397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:19.3356947Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:19.3793660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:19.3794143Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:19.7502007Z ok (3.233s) 2022-05-18T04:59:19.7517745Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:19.7646168Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87433 2022-05-18T04:59:19.7753993Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87434 2022-05-18T04:59:20.9258634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:20.9261005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:20.9423344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:20.9427271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:20.9428092Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:20.9465348Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:22.3294967Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:22.3297159Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:22.5314324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:22.5314874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:22.5768770Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:22.5769274Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:22.8833632Z ok (3.133s) 2022-05-18T04:59:22.8848240Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:22.8977511Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87546 2022-05-18T04:59:22.9085256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87547 2022-05-18T04:59:24.0429443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:24.0431575Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:24.0440041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:24.0443791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:24.0444820Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:24.0534324Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:25.4283968Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:25.4286328Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:25.6292401Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:25.6292958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:25.6764676Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:25.6765163Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:26.0167493Z ok (3.133s) 2022-05-18T04:59:26.0182396Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:26.0312560Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87659 2022-05-18T04:59:26.0421826Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87660 2022-05-18T04:59:27.2063939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:27.2065289Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:27.2066326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:27.2070151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:27.2071591Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:27.2168597Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:28.6018942Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:28.6022335Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:28.7891537Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:28.7892062Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:28.8266279Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:28.8266783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:29.1503502Z ok (3.133s) 2022-05-18T04:59:29.1519071Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:29.1650008Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87772 2022-05-18T04:59:29.1760348Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87773 2022-05-18T04:59:30.3056215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:30.3058481Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:30.3173857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:30.3178226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:30.3179136Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:30.3263055Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:31.7023362Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:31.7026076Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:31.8896593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:31.9284619Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:31.9285113Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:31.9285601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:32.2840757Z ok (3.134s) 2022-05-18T04:59:32.2855925Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:32.2986094Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87885 2022-05-18T04:59:32.3093759Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87886 2022-05-18T04:59:33.4428790Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:33.4431514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:33.4461534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:33.4465241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:33.4466379Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:33.4534319Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:34.8312737Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:34.8315298Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:35.0289987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:35.0290574Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:35.0713840Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:35.0714342Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:35.4175245Z ok (3.133s) 2022-05-18T04:59:35.4191014Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_False_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:35.4321802Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87998 2022-05-18T04:59:35.4609184Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87999 2022-05-18T04:59:36.6104039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:36.6106379Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:36.6414387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:36.6418229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:36.6419320Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:36.6514510Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:38.0069236Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:38.0071612Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:38.2093660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:38.2094189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:38.2535046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:38.2535552Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:38.5692106Z ok (3.152s) 2022-05-18T04:59:38.5707432Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:38.5836172Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88111 2022-05-18T04:59:38.5946739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88112 2022-05-18T04:59:39.7598281Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:39.7600690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:39.7611598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:39.7615060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:39.7616204Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:39.7703874Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:41.1467218Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:41.1469523Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:41.3367907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:41.3368461Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:41.3813013Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:41.3813539Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:41.7030571Z ok (3.134s) 2022-05-18T04:59:41.7046699Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:41.7181266Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88224 2022-05-18T04:59:41.7292099Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88225 2022-05-18T04:59:42.8914990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:42.8917361Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:42.9076607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:42.9079949Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:42.9080846Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:42.9121890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:44.2924331Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:44.2926538Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:44.4820517Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:44.4821071Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:44.5265660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:44.5266522Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:44.8372483Z ok (3.134s) 2022-05-18T04:59:44.8387935Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:44.8515015Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88337 2022-05-18T04:59:44.8623855Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88338 2022-05-18T04:59:46.0469352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:46.0471626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:46.0922617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:46.0926183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:46.0927120Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:46.0981091Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:47.4469190Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:47.4471314Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:47.6461441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:47.6462037Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:47.6951753Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:47.6952260Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:48.0707649Z ok (3.233s) 2022-05-18T04:59:48.0722652Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_False_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:48.0853013Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88450 2022-05-18T04:59:48.0960257Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88451 2022-05-18T04:59:49.2560428Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:49.2563038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:49.2657344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:49.2660992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:49.2662135Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:49.2665952Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:50.6487573Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:50.6489673Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:50.8454070Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:50.8454597Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:50.8953222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:50.8953725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:51.2040933Z ok (3.133s) 2022-05-18T04:59:51.2055506Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:51.2185448Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88563 2022-05-18T04:59:51.2295362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88564 2022-05-18T04:59:52.3873638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:52.3875799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:52.3893674Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:52.3897567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:52.3898860Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:52.3978964Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:53.7672887Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:53.7676139Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:53.9581539Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:53.9582439Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:54.0042747Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:54.0043254Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:54.3376926Z ok (3.133s) 2022-05-18T04:59:54.3392792Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_False_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:54.3523236Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88676 2022-05-18T04:59:54.3632990Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88677 2022-05-18T04:59:55.4992942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:55.4995126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:55.5004558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:55.5008234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:55.5009081Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:55.5099894Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:56.8914539Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:56.8916777Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T04:59:57.0800299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:57.0800866Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:57.1225422Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:57.1225949Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:59:57.4715703Z ok (3.134s) 2022-05-18T04:59:57.4731223Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T04:59:57.4866412Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88789 2022-05-18T04:59:57.4976839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88790 2022-05-18T04:59:58.6770656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:58.6772889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:58.6776927Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:58.6780567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:58.6781480Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:58.6876025Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:00.0628148Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T05:00:00.0630424Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T05:00:00.2591240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:00.2592013Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:00.3061256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:00.3061739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:00.6056226Z ok (3.134s) 2022-05-18T05:00:00.6072370Z test_ddp_zero_overlap_use_gpu_True_use_interleaved_hook_True_gradient_as_bucket_view_True_static_graph_True_shard_buckets_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:00.6202847Z Check that overlapping DDP with ZeRO using the given method determined ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88902 2022-05-18T05:00:00.6312717Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88903 2022-05-18T05:00:01.8096015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:01.8097701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:01.8159812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:01.8163557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:01.8164372Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:01.8200770Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:03.2402019Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T05:00:03.2404254Z INFO:torch.distributed.optim.zero_redundancy_optimizer:Using the functional optimizer instead of since `overlap_with_ddp=True` 2022-05-18T05:00:03.4416144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:03.4416686Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:03.4897541Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:03.4898073Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:03.8397435Z ok (3.234s) 2022-05-18T05:00:03.8437738Z test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:03.8567835Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89015 2022-05-18T05:00:03.8678010Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89016 2022-05-18T05:00:05.0077610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:05.0079715Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:05.0096653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:05.0100097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:05.0101786Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:05.0182957Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:06.2937720Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy7hyc_3q 2022-05-18T05:00:06.2939894Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy7hyc_3q/_remote_module_non_scriptable.py 2022-05-18T05:00:06.3008586Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsry6cu84 2022-05-18T05:00:06.3012665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsry6cu84/_remote_module_non_scriptable.py 2022-05-18T05:00:06.5669126Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:06.5713986Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:06.8336785Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8348606Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8529927Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8542597Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8725182Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8737937Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8921116Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.8933802Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9116089Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9128965Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9311204Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9324124Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9506562Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9520050Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9825793Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:06.9842357Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:07.2767131Z ok (3.437s) 2022-05-18T05:00:07.2808175Z test_local_optimizer_parity_optimizer_class_str_AdamW_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:07.2940321Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89098 2022-05-18T05:00:07.3051915Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89099 2022-05-18T05:00:08.4546106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:08.4548465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:08.4704459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:08.4708274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:08.4709433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:08.4753216Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:09.7556944Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjvgq2npw 2022-05-18T05:00:09.7559012Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjvgq2npw/_remote_module_non_scriptable.py 2022-05-18T05:00:09.7677329Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsqk03wn1 2022-05-18T05:00:09.7681216Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsqk03wn1/_remote_module_non_scriptable.py 2022-05-18T05:00:10.0208522Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:10.0239246Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:10.2877625Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.2887137Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3074070Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3083578Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3270533Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3279495Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3465738Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3475538Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3662295Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3671639Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3857915Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.3867463Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.4053655Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.4063489Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.4371565Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.4399422Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:10.8142519Z ok (3.537s) 2022-05-18T05:00:10.8183202Z test_local_optimizer_parity_optimizer_class_str_Adam_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:10.8317182Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89181 2022-05-18T05:00:10.8427191Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89182 2022-05-18T05:00:11.9791139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:11.9793296Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:11.9907721Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:11.9911407Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:11.9912716Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:11.9998282Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:13.2605316Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsj40v319 2022-05-18T05:00:13.2607491Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsj40v319/_remote_module_non_scriptable.py 2022-05-18T05:00:13.2775480Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo6ez4k8a 2022-05-18T05:00:13.2779373Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo6ez4k8a/_remote_module_non_scriptable.py 2022-05-18T05:00:13.5295799Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:13.5352135Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:13.8043101Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8058725Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8242084Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8258462Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8446900Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8459657Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8644440Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8660648Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8845712Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.8864222Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.9049292Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.9064647Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.9250601Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.9266026Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.9576736Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:13.9608550Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:14.3517787Z ok (3.537s) 2022-05-18T05:00:14.3558574Z test_local_optimizer_parity_optimizer_class_str_Adam_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:14.3688873Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89264 2022-05-18T05:00:14.3797680Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89265 2022-05-18T05:00:15.5158623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:15.5161019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:15.5186989Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:15.5190677Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:15.5191806Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:15.5264438Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:16.7848076Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5d23qkh3 2022-05-18T05:00:16.7849747Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5d23qkh3/_remote_module_non_scriptable.py 2022-05-18T05:00:16.8095253Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz1esvec5 2022-05-18T05:00:16.8098840Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz1esvec5/_remote_module_non_scriptable.py 2022-05-18T05:00:17.0610965Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:17.0634726Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:17.3334055Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3349212Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3535856Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3551641Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3738918Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3754056Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3940336Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.3955603Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4141947Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4157010Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4343309Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4358784Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4545572Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4560916Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4870756Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.4899338Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:17.7884650Z ok (3.437s) 2022-05-18T05:00:17.7925784Z test_local_optimizer_parity_optimizer_class_str_SGD_maximize_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:17.8055554Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89347 2022-05-18T05:00:17.8165086Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89348 2022-05-18T05:00:18.9678992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:18.9680910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:18.9752435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:18.9755745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:18.9756585Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:18.9783923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:20.2431759Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5onccanh 2022-05-18T05:00:20.2433929Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5onccanh/_remote_module_non_scriptable.py 2022-05-18T05:00:20.2672425Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkam20y7j 2022-05-18T05:00:20.2676207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkam20y7j/_remote_module_non_scriptable.py 2022-05-18T05:00:20.5198260Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:20.5260923Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:20.7755792Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.7769104Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.7942900Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.7956668Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8130167Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8148402Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8321712Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8333885Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8507779Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8520873Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8694081Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8707177Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8880844Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.8894569Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.9098440Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:20.9102337Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:21.2254216Z ok (3.437s) 2022-05-18T05:00:21.2296020Z test_local_optimizer_parity_optimizer_class_str_SGD_maximize_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:21.2428680Z When combined with DDP, check that a local optimizer gives the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89430 2022-05-18T05:00:21.2541663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89431 2022-05-18T05:00:22.4032552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:22.4034428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:22.4043380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:22.4046979Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:22.4047800Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:22.4137714Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:23.6881392Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprl43spnp 2022-05-18T05:00:23.6883949Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprl43spnp/_remote_module_non_scriptable.py 2022-05-18T05:00:23.7011530Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxym7wmbk 2022-05-18T05:00:23.7015477Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxym7wmbk/_remote_module_non_scriptable.py 2022-05-18T05:00:23.9556569Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:23.9572991Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:00:24.2097628Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2112051Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2287497Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2300904Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2475549Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2490235Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2665040Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2678989Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2854965Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.2868017Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.3043212Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.3056101Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.3231105Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.3245009Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.3452028Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.3456323Z WARNING:torch.distributed.optim.zero_redundancy_optimizer:ZeroRedundancyOptimizer detected that the trainable parameters changed; rebuilding the parameter buckets if enabled 2022-05-18T05:00:24.6631459Z ok (3.437s) 2022-05-18T05:00:24.6645016Z test_lr_scheduler (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:24.6774662Z Check that a normal PyTorch ``lr_scheduler`` is usable with ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89513 2022-05-18T05:00:24.6884646Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89514 2022-05-18T05:00:25.8565071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:25.8567345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:25.8599706Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:25.8603027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:25.8604353Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:25.8670631Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:27.4960804Z ok (2.833s) 2022-05-18T05:00:27.4988804Z test_multiple_param_groups (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:27.5114937Z Check parity between constructing ZeRO with multiple parameter groups ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89596 2022-05-18T05:00:27.5526970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89597 2022-05-18T05:00:28.7345292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:28.7347603Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:28.7461046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:28.7464820Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:28.7465737Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:28.7552890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:30.5607953Z ok (3.065s) 2022-05-18T05:00:30.5638359Z test_nondefault_process_group (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:30.5770860Z Check that ZeroRedundancyOptimizer works with a non-default process ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89679 2022-05-18T05:00:30.5878898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89680 2022-05-18T05:00:31.7293160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:31.7293765Z INFO:torch.testing._internal.common_distributed:Skipping `test_nondefault_process_group()` since world size of 2 is less than 4 2022-05-18T05:00:31.7316378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:31.7318852Z INFO:torch.testing._internal.common_distributed:Skipping `test_nondefault_process_group()` since world size of 2 is less than 4 2022-05-18T05:00:31.8922204Z ok (1.331s) 2022-05-18T05:00:31.8933022Z test_sharding (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:31.8936519Z Check ZeroRedundancyOptimizer's parameter sharding at construction ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/67295 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-05-18T05:00:31.8953854Z test_step (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:31.9082695Z Check that ZeroRedundancyOptimizer properly exposes the ``step()`` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89747 2022-05-18T05:00:31.9194166Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89748 2022-05-18T05:00:33.0555864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:33.0558479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:33.0570354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:33.0574026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:33.0575406Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:33.0661352Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:34.9276682Z ok (3.034s) 2022-05-18T05:00:34.9298089Z test_step_with_closure (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:34.9427913Z Check that ZeroRedundancyOptimizer properly exposes the ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89830 2022-05-18T05:00:34.9536516Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89831 2022-05-18T05:00:36.0902025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:36.0904173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:36.1449583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:36.1453454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:36.1454270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:36.1515761Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:37.9616603Z ok (3.034s) 2022-05-18T05:00:37.9621919Z test_zero_join_cpu (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:37.9751738Z Check that the ZeRO join hook allows training with uneven inputs ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89913 2022-05-18T05:00:37.9860307Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89914 2022-05-18T05:00:39.1342671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:39.1357579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:39.1471929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:39.1474031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:39.1474878Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:39.1475597Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:39.1583054Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5a2qeito 2022-05-18T05:00:39.1585733Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5a2qeito/_remote_module_non_scriptable.py 2022-05-18T05:00:39.1586696Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvpicdu27 2022-05-18T05:00:39.1590029Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvpicdu27/_remote_module_non_scriptable.py 2022-05-18T05:00:39.1803264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:39.1803783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:39.2214206Z /opt/conda/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up 2022-05-18T05:00:39.2214688Z _warnings.warn(warn_message, ResourceWarning) 2022-05-18T05:00:39.2217118Z /opt/conda/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up 2022-05-18T05:00:39.2217607Z _warnings.warn(warn_message, ResourceWarning) 2022-05-18T05:00:39.3906146Z ok (1.429s) 2022-05-18T05:00:39.3911653Z test_zero_join_gpu (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:39.4040646Z Check that the ZeRO join hook allows training with uneven inputs ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89991 2022-05-18T05:00:39.4149403Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89992 2022-05-18T05:00:40.5281023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:40.5288504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:40.5569555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:40.5578977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:40.5580433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:40.5595495Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:41.8331987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvjkp54ma 2022-05-18T05:00:41.8333163Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvjkp54ma/_remote_module_non_scriptable.py 2022-05-18T05:00:41.8778595Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbsqssiok 2022-05-18T05:00:41.8781080Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbsqssiok/_remote_module_non_scriptable.py 2022-05-18T05:00:42.1336364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:42.2100025Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:00:42.2101694Z /opt/conda/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up 2022-05-18T05:00:42.2102720Z _warnings.warn(warn_message, ResourceWarning) 2022-05-18T05:00:42.2104380Z /opt/conda/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up 2022-05-18T05:00:42.2105398Z _warnings.warn(warn_message, ResourceWarning) 2022-05-18T05:00:42.5234862Z ok (3.133s) 2022-05-18T05:00:42.5242255Z test_zero_model_parallel_parameters_as_bucket_view_False (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:42.5372819Z Check that ZeRO works with model parallelism where the model's ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90074 2022-05-18T05:00:42.5481635Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90075 2022-05-18T05:00:43.7192308Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:43.7333165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:43.9527454Z skip: Need at least 4 CUDA devices (1.429s) 2022-05-18T05:00:43.9534624Z test_zero_model_parallel_parameters_as_bucket_view_True (__main__.TestZeroRedundancyOptimizerDistributed) 2022-05-18T05:00:43.9664284Z Check that ZeRO works with model parallelism where the model's ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90142 2022-05-18T05:00:43.9776202Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90143 2022-05-18T05:00:45.1468889Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:45.1504567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:45.2818561Z skip: Need at least 4 CUDA devices (1.329s) 2022-05-18T05:00:45.2841225Z test_constructor (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:45.2974531Z Check the robustness of the ZeroRedundancyOptimizer constructor by ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90210 2022-05-18T05:00:46.4404721Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:46.4406594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:46.4407719Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:46.6013767Z ok (1.319s) 2022-05-18T05:00:46.6027583Z test_lr_scheduler (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:46.6158098Z Check that a normal PyTorch ``lr_scheduler`` is usable with ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90245 2022-05-18T05:00:47.7511248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:47.7513262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:47.7514426Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:49.2223868Z ok (2.621s) 2022-05-18T05:00:49.2234337Z test_same_dense_param_type (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:49.2361513Z Check that ZeroRedundancyOptimizer raises an exception if the input ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90287 2022-05-18T05:00:50.3883783Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:50.3885848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:50.3886782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:50.5399339Z ok (1.317s) 2022-05-18T05:00:50.5427421Z test_state_dict (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:50.5555661Z Check that ZeroRedundancyOptimizer exposes the expected state dict ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90322 2022-05-18T05:00:51.7053503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:51.7055432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:51.7056603Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:53.2624682Z ok (2.722s) 2022-05-18T05:00:53.2636604Z test_step_with_extra_inner_key (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:53.2766308Z Check that ZeroRedundancyOptimizer wrapping an optimizer that adds ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90364 2022-05-18T05:00:54.3922242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:54.3924272Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:54.3925549Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:55.8834732Z ok (2.621s) 2022-05-18T05:00:55.8846853Z test_step_with_kwargs (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:55.8977054Z Check that the ``step(**kwargs)`` interface is properly exposed. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90406 2022-05-18T05:00:57.0102071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:57.0104193Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:57.0105032Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:58.5044319Z ok (2.621s) 2022-05-18T05:00:58.5054783Z test_step_without_closure (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:00:58.5186929Z Check that the ``step()`` method (without closure) is handled as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90448 2022-05-18T05:00:59.6677091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:59.6679103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:59.6679923Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:01:01.2256162Z ok (2.721s) 2022-05-18T05:01:01.2267507Z test_zero_grad (__main__.TestZeroRedundancyOptimizerSingleRank) 2022-05-18T05:01:01.2399458Z Check that the ``zero_grad`` method is properly handled. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90490 2022-05-18T05:01:02.3954973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:02.3956281Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:02.3957251Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:01:02.5439679Z ok (1.318s) 2022-05-18T05:01:02.5439883Z 2022-05-18T05:01:02.5440266Z ---------------------------------------------------------------------- 2022-05-18T05:01:02.5440615Z Ran 42 tests in 113.814s 2022-05-18T05:01:02.5441106Z 2022-05-18T05:01:02.5447556Z OK (skipped=4) 2022-05-18T05:01:02.5447740Z 2022-05-18T05:01:02.5447876Z Generating XML reports... 2022-05-18T05:01:02.5538091Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20220518045908.xml 2022-05-18T05:01:02.5549773Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20220518045908.xml 2022-05-18T05:01:02.8363652Z Running distributed/_shard/sharded_tensor/test_sharded_tensor ... [2022-05-18 05:01:02.835873] 2022-05-18T05:01:02.8364430Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/test_sharded_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:01:02.835975] 2022-05-18T05:01:03.7531376Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor 2022-05-18T05:01:03.7564167Z 2022-05-18T05:01:03.7564415Z Running tests... 2022-05-18T05:01:03.7564854Z ---------------------------------------------------------------------- 2022-05-18T05:01:05.3015462Z test_empty (__main__.TestCreateTensorFromParams) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:05.3172904Z ok (1.561s) 2022-05-18T05:01:05.3439898Z test_local_tensor (__main__.TestLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90562 2022-05-18T05:01:05.3552516Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90563 2022-05-18T05:01:05.3665825Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 90564 2022-05-18T05:01:05.3783881Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 90565 2022-05-18T05:01:06.2799371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:06.3019096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:06.3213578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:06.3469416Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:06.5830858Z skip: Need at least 4 CUDA devices (1.265s) 2022-05-18T05:01:06.5972772Z test_local_tensor_error (__main__.TestLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90698 2022-05-18T05:01:06.6081550Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90699 2022-05-18T05:01:06.6194304Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 90700 2022-05-18T05:01:06.6310403Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 90701 2022-05-18T05:01:07.5680100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:07.6072986Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:07.6316233Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:07.6393986Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:07.8353362Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:01:07.8494850Z test_collect_local_shard (__main__.TestModuleHookApi) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90834 2022-05-18T05:01:07.8603396Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90835 2022-05-18T05:01:07.8715510Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 90836 2022-05-18T05:01:07.8829619Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 90837 2022-05-18T05:01:08.8418342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:08.8548224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:08.8692361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:08.8700384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:09.0871801Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:01:09.1018197Z test_reshard_output (__main__.TestModuleHookApi) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90970 2022-05-18T05:01:09.1126855Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90971 2022-05-18T05:01:09.1240078Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 90972 2022-05-18T05:01:09.1357504Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 90973 2022-05-18T05:01:10.0799821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:10.1022180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:10.1180488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:10.1186634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:10.3399404Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:10.3549013Z test_shard_parameter (__main__.TestShardParameter) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91106 2022-05-18T05:01:10.3660146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91107 2022-05-18T05:01:10.3773723Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91108 2022-05-18T05:01:10.3889846Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91109 2022-05-18T05:01:11.3034500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:11.3451634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:11.3510796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:11.3711796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:11.5931787Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:11.6082367Z test_shard_parameter_errors (__main__.TestShardParameter) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91242 2022-05-18T05:01:11.6191091Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91243 2022-05-18T05:01:11.6303485Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91244 2022-05-18T05:01:11.6421666Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91245 2022-05-18T05:01:12.5594483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:12.5869327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:12.6328636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:12.6335029Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:12.8463369Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:12.8609252Z test_shard_tensor (__main__.TestShardTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91378 2022-05-18T05:01:12.8718482Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91379 2022-05-18T05:01:12.8830726Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91380 2022-05-18T05:01:12.8946564Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91381 2022-05-18T05:01:13.8868227Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:13.8869089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:13.8949818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:13.9231569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:14.0989575Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:01:14.1140109Z test_shard_tensor_errors (__main__.TestShardTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91514 2022-05-18T05:01:14.1250659Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91515 2022-05-18T05:01:14.1362656Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91516 2022-05-18T05:01:14.1480155Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91517 2022-05-18T05:01:15.1153333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:15.1210864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:15.1485630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:15.2314325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:15.4524383Z skip: Need at least 4 CUDA devices (1.353s) 2022-05-18T05:01:15.4665813Z test_cleanup (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91650 2022-05-18T05:01:15.4777210Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91651 2022-05-18T05:01:15.4893235Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91652 2022-05-18T05:01:15.5009845Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91653 2022-05-18T05:01:16.4026239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:16.4055304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:16.4161329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:16.4316842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:16.6050169Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:01:16.6209575Z test_complete_world_size (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91786 2022-05-18T05:01:16.6322614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91787 2022-05-18T05:01:16.6438521Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91788 2022-05-18T05:01:16.6557963Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91789 2022-05-18T05:01:17.6344595Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:17.6641875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:17.6645071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:17.7218372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:17.9602157Z skip: Need at least 4 CUDA devices (1.355s) 2022-05-18T05:01:17.9625651Z test_create_sharded_tensor_like (__main__.TestShardedTensorChunked) 2022-05-18T05:01:17.9759140Z Test tensor like methods, i.e. torch.zeros_like(...), torch.full_like, etc. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91922 2022-05-18T05:01:17.9869644Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91923 2022-05-18T05:01:17.9982181Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 91924 2022-05-18T05:01:18.0096263Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 91925 2022-05-18T05:01:19.0041228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:19.0348190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:19.0378553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:19.0679846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:19.2138666Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:19.2154581Z test_create_sharded_tensor_with_full (__main__.TestShardedTensorChunked) 2022-05-18T05:01:19.2290594Z Test sharded_tensor.full(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92058 2022-05-18T05:01:19.2402752Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92059 2022-05-18T05:01:19.2515388Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92060 2022-05-18T05:01:19.2633535Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92061 2022-05-18T05:01:20.2218846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:20.2345865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:20.2391272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:20.2726883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:20.4675995Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:20.4688781Z test_create_sharded_tensor_with_ones (__main__.TestShardedTensorChunked) 2022-05-18T05:01:20.4824319Z Test sharded_tensor.ones(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92194 2022-05-18T05:01:20.4936369Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92195 2022-05-18T05:01:20.5050719Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92196 2022-05-18T05:01:20.5167734Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92197 2022-05-18T05:01:21.4526641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:21.4530181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:21.4531203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:21.4841676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:21.7210515Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:21.7230911Z test_create_sharded_tensor_with_rand (__main__.TestShardedTensorChunked) 2022-05-18T05:01:21.7367372Z Test sharded_tensor.rand(...)/randn(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92330 2022-05-18T05:01:21.7479026Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92331 2022-05-18T05:01:21.7594369Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92332 2022-05-18T05:01:21.7716073Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92333 2022-05-18T05:01:22.7653438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:22.8364839Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:22.8387730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:22.8429059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:23.0761603Z skip: Need at least 4 CUDA devices (1.355s) 2022-05-18T05:01:23.0774299Z test_create_sharded_tensor_with_zeros (__main__.TestShardedTensorChunked) 2022-05-18T05:01:23.0909065Z Test sharded_tensor.zeros(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92466 2022-05-18T05:01:23.1020786Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92467 2022-05-18T05:01:23.1137546Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92468 2022-05-18T05:01:23.1254164Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92469 2022-05-18T05:01:24.1118775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:24.1128413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:24.1229492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:24.1414035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:24.3297499Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:24.3310475Z test_gather_even (__main__.TestShardedTensorChunked) 2022-05-18T05:01:24.3443598Z Test _sharded_tensor.gather(...) with evenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92602 2022-05-18T05:01:24.3555242Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92603 2022-05-18T05:01:24.3670071Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92604 2022-05-18T05:01:24.3789521Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92605 2022-05-18T05:01:25.3719392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:25.3805979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:25.4211906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:25.4214901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:25.5832168Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:25.5844686Z test_gather_uneven (__main__.TestShardedTensorChunked) 2022-05-18T05:01:25.5977083Z Test _sharded_tensor.gather(...) with unevenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92738 2022-05-18T05:01:25.6087280Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92739 2022-05-18T05:01:25.6199534Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92740 2022-05-18T05:01:25.6314663Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92741 2022-05-18T05:01:26.5644592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:26.6152787Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:26.6531117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:26.6555213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:26.8358782Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:01:26.8506384Z test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92874 2022-05-18T05:01:26.8617077Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92875 2022-05-18T05:01:26.8729143Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 92876 2022-05-18T05:01:26.8847962Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 92877 2022-05-18T05:01:27.8060518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:27.8322757Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:27.8514337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:27.8790762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:28.0889038Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:28.1033411Z test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93010 2022-05-18T05:01:28.1143285Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93011 2022-05-18T05:01:28.1255151Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93012 2022-05-18T05:01:28.1369533Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93013 2022-05-18T05:01:29.0527377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:29.1125960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:29.1308514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:29.1332985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:29.3412755Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:01:29.3570573Z test_invalid_sharding (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93146 2022-05-18T05:01:29.3680898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93147 2022-05-18T05:01:29.3796726Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93148 2022-05-18T05:01:29.3914609Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93149 2022-05-18T05:01:30.3605144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:30.4483178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:30.4485354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:30.4885251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:30.6959004Z skip: Need at least 4 CUDA devices (1.354s) 2022-05-18T05:01:30.7109007Z test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93282 2022-05-18T05:01:30.7219418Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93283 2022-05-18T05:01:30.7334096Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93284 2022-05-18T05:01:30.7451885Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93285 2022-05-18T05:01:31.7244821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:31.7538657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:31.7614407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:31.7735536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:31.9492942Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:31.9644244Z test_multiple_local_shards (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93418 2022-05-18T05:01:31.9753185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93419 2022-05-18T05:01:31.9865648Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93420 2022-05-18T05:01:31.9981621Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93421 2022-05-18T05:01:33.0040421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:33.0131229Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:33.0139929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:33.0579061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:33.2023424Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:33.2182184Z test_new_group (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93554 2022-05-18T05:01:33.2290907Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93555 2022-05-18T05:01:33.2403821Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93556 2022-05-18T05:01:33.2520814Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93557 2022-05-18T05:01:34.1982420Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:34.2328601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:34.2692316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:34.2830538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:34.4563591Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:34.4718686Z test_partial_world_size (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93690 2022-05-18T05:01:34.4831294Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93691 2022-05-18T05:01:34.4947013Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93692 2022-05-18T05:01:34.5068989Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93693 2022-05-18T05:01:35.4945682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:35.5043315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:35.5108136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:35.5449781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:35.7111047Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:01:35.7260653Z test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93826 2022-05-18T05:01:35.7368975Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93827 2022-05-18T05:01:35.7482511Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93828 2022-05-18T05:01:35.7597385Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93829 2022-05-18T05:01:36.7105142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:36.7221942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:36.7272703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:36.7277459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:36.9641848Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:36.9794313Z test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93962 2022-05-18T05:01:36.9903564Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93963 2022-05-18T05:01:37.0015910Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 93964 2022-05-18T05:01:37.0132680Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 93965 2022-05-18T05:01:38.0190284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:38.0284483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:38.0334862Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:38.0805630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:38.3176798Z skip: Need at least 4 CUDA devices (1.353s) 2022-05-18T05:01:38.3325296Z test_sharding_columns (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94098 2022-05-18T05:01:38.3435545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94099 2022-05-18T05:01:38.3548227Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94100 2022-05-18T05:01:38.3662701Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94101 2022-05-18T05:01:39.3398271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:39.3603371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:39.3763553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:39.3974495Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:39.5707721Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:39.5853393Z test_state_dict (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94234 2022-05-18T05:01:39.5964026Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94235 2022-05-18T05:01:39.6081353Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94236 2022-05-18T05:01:39.6201542Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94237 2022-05-18T05:01:40.6275650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:40.6284050Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:40.6301448Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:40.6325281Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:40.8246901Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:40.8395078Z test_state_dict_new_group (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94370 2022-05-18T05:01:40.8511272Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94371 2022-05-18T05:01:40.8625644Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94372 2022-05-18T05:01:40.8743193Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94373 2022-05-18T05:01:41.7852459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:41.7954530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:41.8939607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:41.8946115Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:42.0785175Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:42.0930779Z test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94506 2022-05-18T05:01:42.1042500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94507 2022-05-18T05:01:42.1158292Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94508 2022-05-18T05:01:42.1282192Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94509 2022-05-18T05:01:43.1094445Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:43.1153334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:43.1240756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:43.1381377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:43.3323924Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:43.3470086Z test_custom_op (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94642 2022-05-18T05:01:43.3581736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94643 2022-05-18T05:01:43.3698727Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94644 2022-05-18T05:01:43.3817096Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94645 2022-05-18T05:01:44.3047273Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:44.3677108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:44.3723530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:44.3989150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:44.5859493Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:44.6001647Z test_custom_op_errors (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94778 2022-05-18T05:01:44.6111785Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94779 2022-05-18T05:01:44.6224371Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94780 2022-05-18T05:01:44.6341383Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94781 2022-05-18T05:01:45.6143203Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:45.6268715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:45.6391478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:45.6805879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:45.8385330Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:01:45.8531190Z test_custom_op_override (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94914 2022-05-18T05:01:45.8640048Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94915 2022-05-18T05:01:45.8753044Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 94916 2022-05-18T05:01:45.8871395Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 94917 2022-05-18T05:01:46.8508049Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:46.8538408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:46.8637318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:46.8812561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:47.0914217Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:01:47.0931006Z test_create_sharded_tensor_with_ones (__main__.TestShardedTensorEnumerable) 2022-05-18T05:01:47.1064488Z Test sharded_tensor.ones(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95050 2022-05-18T05:01:47.1176787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95051 2022-05-18T05:01:47.1291713Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95052 2022-05-18T05:01:47.1411797Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95053 2022-05-18T05:01:48.0547133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:48.0749297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:48.1041736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:48.1305085Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:48.3454208Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:48.3471224Z test_gather_even (__main__.TestShardedTensorEnumerable) 2022-05-18T05:01:48.3606392Z Test _sharded_tensor.gather(...) with evenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95186 2022-05-18T05:01:48.3719618Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95187 2022-05-18T05:01:48.3835063Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95188 2022-05-18T05:01:48.3952104Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95189 2022-05-18T05:01:49.3355280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:49.3357008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:49.3698597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:49.3712079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:49.5995189Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:49.6011117Z test_gather_uneven (__main__.TestShardedTensorEnumerable) 2022-05-18T05:01:49.6143997Z Test _sharded_tensor.gather(...) with unevenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95322 2022-05-18T05:01:49.6256111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95323 2022-05-18T05:01:49.6370460Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95324 2022-05-18T05:01:49.6491230Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95325 2022-05-18T05:01:50.6078844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:50.6683473Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:50.6684010Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:50.7301612Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:50.9535901Z skip: Need at least 4 CUDA devices (1.354s) 2022-05-18T05:01:50.9697286Z test_grid_sharding (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95458 2022-05-18T05:01:50.9809835Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95459 2022-05-18T05:01:50.9924376Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95460 2022-05-18T05:01:51.0041716Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95461 2022-05-18T05:01:52.0066037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:52.0070902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:52.0078600Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:52.0121263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:52.2084458Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:01:52.2247537Z test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95594 2022-05-18T05:01:52.2359692Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95595 2022-05-18T05:01:52.2475094Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95596 2022-05-18T05:01:52.2595250Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95597 2022-05-18T05:01:53.1911219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:53.1911837Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:53.1920500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:53.2533751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:53.4637867Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:01:53.4796107Z test_new_group (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95730 2022-05-18T05:01:53.4907020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95731 2022-05-18T05:01:53.5020092Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95732 2022-05-18T05:01:53.5137403Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95733 2022-05-18T05:01:54.4378027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:54.4588753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:54.4595197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:54.5065386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:54.7181720Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:54.7341608Z test_partial_world_size (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95866 2022-05-18T05:01:54.7450477Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95867 2022-05-18T05:01:54.7562686Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 95868 2022-05-18T05:01:54.7681763Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 95869 2022-05-18T05:01:55.7021497Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:55.7022183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:55.7357280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:55.7535763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:55.9723359Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:55.9882235Z test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96002 2022-05-18T05:01:55.9991687Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96003 2022-05-18T05:01:56.0105362Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96004 2022-05-18T05:01:56.0223057Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96005 2022-05-18T05:01:56.9325757Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:56.9426663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:56.9446327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:56.9753925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:57.1264906Z skip: Need at least 4 CUDA devices (1.154s) 2022-05-18T05:01:57.1430326Z test_sharded_tensor_to_cpu (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96138 2022-05-18T05:01:57.1538544Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96139 2022-05-18T05:01:57.1650594Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96140 2022-05-18T05:01:57.1768172Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96141 2022-05-18T05:01:58.1280570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:58.1563152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:58.1588624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:58.1876704Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:58.3810576Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:58.3972863Z test_uneven_shards (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96274 2022-05-18T05:01:58.4083284Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96275 2022-05-18T05:01:58.4196202Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96276 2022-05-18T05:01:58.4311930Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96277 2022-05-18T05:01:59.3455188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:59.3537744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:01:59.3826194Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:59.4266219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:01:59.6355030Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:01:59.6514347Z test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96410 2022-05-18T05:01:59.6625020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96411 2022-05-18T05:01:59.6738385Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96412 2022-05-18T05:01:59.6855604Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96413 2022-05-18T05:02:00.6104014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:00.6234031Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:00.6694810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:00.6695583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:00.8897150Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:02:00.9055449Z test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96546 2022-05-18T05:02:00.9165174Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96547 2022-05-18T05:02:00.9279896Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96548 2022-05-18T05:02:00.9395268Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96549 2022-05-18T05:02:01.8665417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:01.8682556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:01.8796862Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:01.8960351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:02.1438146Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:02:02.1604516Z test_init_from_local_shards_and_global_metadata (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96682 2022-05-18T05:02:02.1716522Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96683 2022-05-18T05:02:02.1834283Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96684 2022-05-18T05:02:02.1958251Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96685 2022-05-18T05:02:03.2025667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:03.2152215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:03.2166544Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:03.2243734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:03.4000853Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:02:03.4169615Z test_init_from_local_shards_and_global_metadata_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96818 2022-05-18T05:02:03.4278488Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96819 2022-05-18T05:02:03.4391516Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96820 2022-05-18T05:02:03.4506114Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96821 2022-05-18T05:02:04.3938287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:04.4142806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:04.4760971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:04.4771835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:04.6549832Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:02:04.6700990Z test_init_from_local_shards_invalid_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96954 2022-05-18T05:02:04.6811219Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96955 2022-05-18T05:02:04.6923298Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 96956 2022-05-18T05:02:04.7041536Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 96957 2022-05-18T05:02:05.6452415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:05.6596519Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:05.6787962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:05.6837966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:05.9083473Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:02:05.9229522Z test_init_from_local_shards_invalid_pin_memory (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97090 2022-05-18T05:02:05.9338026Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97091 2022-05-18T05:02:05.9450723Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97092 2022-05-18T05:02:05.9567120Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97093 2022-05-18T05:02:06.8675278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:06.8740596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:06.9473842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:06.9495177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:06.9709269Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:06.9763877Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:06.9866716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:02:06.9867559Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:06.9868095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:02:06.9868735Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:06.9870997Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:06.9916187Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:07.1609024Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:02:07.1764011Z test_init_from_local_shards_invalid_property_cross_ranks (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97238 2022-05-18T05:02:07.1873908Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97239 2022-05-18T05:02:07.1986082Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97240 2022-05-18T05:02:07.2103476Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97241 2022-05-18T05:02:08.1942315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:08.2083079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:08.2181538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:08.2610607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:08.4146565Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:02:08.4291155Z test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97374 2022-05-18T05:02:08.4399966Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97375 2022-05-18T05:02:08.4514292Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97376 2022-05-18T05:02:08.4629644Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97377 2022-05-18T05:02:09.4427034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:09.4827724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:09.4836424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:09.5443799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:09.7673756Z skip: Need at least 4 CUDA devices (1.353s) 2022-05-18T05:02:09.7818115Z test_init_from_local_shards_invalid_shards_overlap (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97510 2022-05-18T05:02:09.7927655Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97511 2022-05-18T05:02:09.8040211Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97512 2022-05-18T05:02:09.8160044Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97513 2022-05-18T05:02:10.7395252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:10.7901551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:10.7906646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:10.8153637Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:11.0204979Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:02:11.0356016Z test_init_from_local_shards_new_group (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97646 2022-05-18T05:02:11.0465509Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97647 2022-05-18T05:02:11.0578268Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97648 2022-05-18T05:02:11.0692643Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97649 2022-05-18T05:02:11.9786797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:12.0085999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:12.0529500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:12.0840971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:12.2737314Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:02:12.2883539Z test_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97782 2022-05-18T05:02:12.2991927Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97783 2022-05-18T05:02:12.3104734Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97784 2022-05-18T05:02:12.3221081Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97785 2022-05-18T05:02:13.3230170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:13.3361765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:13.3362584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:13.3719728Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:13.5262606Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:02:13.5407318Z test_init_from_local_tensor (__main__.TestShardedTensorFromLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97918 2022-05-18T05:02:13.5515598Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97919 2022-05-18T05:02:13.5629731Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97920 2022-05-18T05:02:13.5744406Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97921 2022-05-18T05:02:14.5297969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:14.5562729Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:14.5700095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:14.5808380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:14.7790259Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:02:14.7931795Z test_init_from_local_tensor_errors (__main__.TestShardedTensorFromLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98054 2022-05-18T05:02:14.8040716Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98055 2022-05-18T05:02:14.8153639Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98056 2022-05-18T05:02:14.8269935Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98057 2022-05-18T05:02:15.8031114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:15.8226337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:15.8523537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:15.8535522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:16.0314261Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:02:16.0871756Z test_serialize_and_deserialize (__main__.TestShardedTensorMetadata) ... ok (0.056s) 2022-05-18T05:02:16.0872853Z 2022-05-18T05:02:16.0873265Z ---------------------------------------------------------------------- 2022-05-18T05:02:16.0873609Z Ran 58 tests in 72.331s 2022-05-18T05:02:16.0873776Z 2022-05-18T05:02:16.0873886Z OK (skipped=56) 2022-05-18T05:02:16.0874043Z 2022-05-18T05:02:16.0874189Z Generating XML reports... 2022-05-18T05:02:16.0916582Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestCreateTensorFromParams-20220518050103.xml 2022-05-18T05:02:16.0919101Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorMetadata-20220518050103.xml 2022-05-18T05:02:16.0923763Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestLocalTensor-20220518050103.xml 2022-05-18T05:02:16.0928142Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestModuleHookApi-20220518050103.xml 2022-05-18T05:02:16.0932818Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardParameter-20220518050103.xml 2022-05-18T05:02:16.0937462Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardTensor-20220518050103.xml 2022-05-18T05:02:16.0967567Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorChunked-20220518050103.xml 2022-05-18T05:02:16.0972849Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorCustomOps-20220518050103.xml 2022-05-18T05:02:16.0989283Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorEnumerable-20220518050103.xml 2022-05-18T05:02:16.1002653Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalShards-20220518050103.xml 2022-05-18T05:02:16.1007238Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalTensor-20220518050103.xml 2022-05-18T05:02:16.3693888Z Running distributed/test_pg_wrapper ... [2022-05-18 05:02:16.368877] 2022-05-18T05:02:16.3694628Z Executing ['/opt/conda/bin/python', 'distributed/test_pg_wrapper.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:02:16.368988] 2022-05-18T05:02:17.2515604Z 2022-05-18T05:02:17.2516303Z 2022-05-18T05:02:17.2518565Z , <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch_cuda>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch_cuda_debug_mode>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collective_shape_mismatch_debug_mode>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch_cuda>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch_cuda_debug_mode>, <__main__.ProcessGroupGlooWrapperTest testMethod=test_collectives_op_mismatch_debug_mode>]> 2022-05-18T05:02:17.2520919Z test_collective_hang (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2521367Z test_collective_shape_mismatch (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2522035Z test_collective_shape_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2522527Z test_collective_shape_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2523014Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2523450Z test_collectives_op_mismatch (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2523892Z test_collectives_op_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2524378Z test_collectives_op_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2524849Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) 2022-05-18T05:02:17.2525853Z , <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collective_shape_mismatch>, <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collective_shape_mismatch_debug_mode>, <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collectives_op_mismatch>, <__main__.ProcessGroupNCCLWrapperTest testMethod=test_collectives_op_mismatch_debug_mode>]> 2022-05-18T05:02:17.2526808Z test_collective_hang (__main__.ProcessGroupNCCLWrapperTest) 2022-05-18T05:02:17.2527234Z test_collective_shape_mismatch (__main__.ProcessGroupNCCLWrapperTest) 2022-05-18T05:02:17.2527694Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) 2022-05-18T05:02:17.2528283Z test_collectives_op_mismatch (__main__.ProcessGroupNCCLWrapperTest) 2022-05-18T05:02:17.2528731Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) 2022-05-18T05:02:18.1425405Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:18.1440777Z 2022-05-18T05:02:18.1440913Z Running tests... 2022-05-18T05:02:18.1441347Z ---------------------------------------------------------------------- 2022-05-18T05:02:19.7299239Z test_collective_hang (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:19.7707540Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98258 2022-05-18T05:02:19.7818387Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98259 2022-05-18T05:02:19.7932411Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98260 2022-05-18T05:02:19.8044147Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98261 2022-05-18T05:02:20.6758006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:20.7286262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:20.7318780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:20.7355333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:20.7529577Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:20.7597934Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:20.7677681Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:02:20.7678173Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:02:20.7678972Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:20.7679668Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:20.7701218Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:20.7734358Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:20.8622176Z [E ProcessGroupGloo.cpp:2791] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 2000 ms 2022-05-18T05:02:20.8635490Z [E ProcessGroupGloo.cpp:136] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 2000 ms 2022-05-18T05:02:20.8739011Z [E ProcessGroupGloo.cpp:136] Rank 2 successfully reached monitoredBarrier, but received errors while waiting for send/recv from rank 0. Please check rank 0 logs for faulty rank. 2022-05-18T05:02:20.8840377Z [E ProcessGroupGloo.cpp:136] Rank 3 successfully reached monitoredBarrier, but received errors while waiting for send/recv from rank 0. Please check rank 0 logs for faulty rank. 2022-05-18T05:02:21.1093472Z ok (2.965s) 2022-05-18T05:02:21.1093686Z 2022-05-18T05:02:21.1094092Z ---------------------------------------------------------------------- 2022-05-18T05:02:21.1094461Z Ran 1 test in 2.965s 2022-05-18T05:02:21.1094611Z 2022-05-18T05:02:21.1094706Z OK 2022-05-18T05:02:21.1094843Z 2022-05-18T05:02:21.1094974Z Generating XML reports... 2022-05-18T05:02:21.1139427Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050218.xml 2022-05-18T05:02:22.2716378Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:22.2730682Z 2022-05-18T05:02:22.2731070Z Running tests... 2022-05-18T05:02:22.2731569Z ---------------------------------------------------------------------- 2022-05-18T05:02:23.8457250Z test_collective_shape_mismatch (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:23.8863282Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98465 2022-05-18T05:02:23.8972507Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98466 2022-05-18T05:02:23.9085676Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98467 2022-05-18T05:02:23.9199367Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98468 2022-05-18T05:02:24.7927435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:24.7972681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:24.7986629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:24.8539328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:24.8703057Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:24.8792967Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:02:24.8793478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:24.8794189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:02:24.8794980Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:24.8795668Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:24.8806367Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:24.8895971Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:25.3250929Z ok (3.052s) 2022-05-18T05:02:25.3251117Z 2022-05-18T05:02:25.3251527Z ---------------------------------------------------------------------- 2022-05-18T05:02:25.3251850Z Ran 1 test in 3.052s 2022-05-18T05:02:25.3252019Z 2022-05-18T05:02:25.3252126Z OK 2022-05-18T05:02:25.3252272Z 2022-05-18T05:02:25.3252402Z Generating XML reports... 2022-05-18T05:02:25.3295326Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050222.xml 2022-05-18T05:02:26.4681231Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:26.4695591Z 2022-05-18T05:02:26.4695812Z Running tests... 2022-05-18T05:02:26.4696303Z ---------------------------------------------------------------------- 2022-05-18T05:02:28.0643660Z test_collective_shape_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:28.1053061Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98672 2022-05-18T05:02:28.1163188Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98673 2022-05-18T05:02:28.1275142Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98674 2022-05-18T05:02:28.1391063Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98675 2022-05-18T05:02:29.0225751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:29.0595641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:29.0744066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:29.0851847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:29.2436990Z skip: Need at least 4 CUDA devices (2.774s) 2022-05-18T05:02:29.2437529Z 2022-05-18T05:02:29.2438205Z ---------------------------------------------------------------------- 2022-05-18T05:02:29.2438823Z Ran 1 test in 2.774s 2022-05-18T05:02:29.2439135Z 2022-05-18T05:02:29.2439321Z OK (skipped=1) 2022-05-18T05:02:29.2439575Z 2022-05-18T05:02:29.2439811Z Generating XML reports... 2022-05-18T05:02:29.2483674Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050226.xml 2022-05-18T05:02:30.4081668Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:30.4096517Z 2022-05-18T05:02:30.4096663Z Running tests... 2022-05-18T05:02:30.4097102Z ---------------------------------------------------------------------- 2022-05-18T05:02:32.0000719Z test_collective_shape_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:32.0401407Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98843 2022-05-18T05:02:32.0510933Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98844 2022-05-18T05:02:32.0621835Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98845 2022-05-18T05:02:32.0733476Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98846 2022-05-18T05:02:32.9609244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:32.9727119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:33.0155870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:33.0219579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:33.1779278Z skip: Need at least 4 CUDA devices (2.768s) 2022-05-18T05:02:33.1779818Z 2022-05-18T05:02:33.1780513Z ---------------------------------------------------------------------- 2022-05-18T05:02:33.1781119Z Ran 1 test in 2.768s 2022-05-18T05:02:33.1781425Z 2022-05-18T05:02:33.1781616Z OK (skipped=1) 2022-05-18T05:02:33.1781904Z 2022-05-18T05:02:33.1782133Z Generating XML reports... 2022-05-18T05:02:33.1826378Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050230.xml 2022-05-18T05:02:34.3398471Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:34.3412650Z 2022-05-18T05:02:34.3412800Z Running tests... 2022-05-18T05:02:34.3413243Z ---------------------------------------------------------------------- 2022-05-18T05:02:35.9058013Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:35.9466284Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99014 2022-05-18T05:02:35.9574974Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99015 2022-05-18T05:02:35.9686730Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99016 2022-05-18T05:02:35.9799938Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99017 2022-05-18T05:02:36.9234115Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:36.9237681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:36.9291149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:36.9328100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:37.0016770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:37.0017279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:02:37.0018004Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:37.0018794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:02:37.0019599Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:37.0020291Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:37.0119818Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:37.0120510Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:37.0640318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:02:37.0743330Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:02:37.0845324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2022-05-18T05:02:37.0845874Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2022-05-18T05:02:37.0846614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:37.0847556Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:37.0848231Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:37.0947542Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:37.3852383Z ok (3.044s) 2022-05-18T05:02:37.3852751Z 2022-05-18T05:02:37.3853219Z ---------------------------------------------------------------------- 2022-05-18T05:02:37.3853564Z Ran 1 test in 3.044s 2022-05-18T05:02:37.3853710Z 2022-05-18T05:02:37.3853805Z OK 2022-05-18T05:02:37.3853940Z 2022-05-18T05:02:37.3854074Z Generating XML reports... 2022-05-18T05:02:37.3897668Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050234.xml 2022-05-18T05:02:38.5364857Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:38.5378625Z 2022-05-18T05:02:38.5378986Z Running tests... 2022-05-18T05:02:38.5379860Z ---------------------------------------------------------------------- 2022-05-18T05:02:40.1166571Z test_collectives_op_mismatch (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:40.1572411Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99233 2022-05-18T05:02:40.1681667Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99234 2022-05-18T05:02:40.1791974Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99235 2022-05-18T05:02:40.1905352Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99236 2022-05-18T05:02:41.0737219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:41.0799497Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:41.0802877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:41.1015171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:41.1251856Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:41.1355174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:02:41.1355995Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:41.1356499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:02:41.1357287Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:41.1357993Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:41.1358667Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:41.1359352Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:41.4956646Z ok (2.957s) 2022-05-18T05:02:41.4956929Z 2022-05-18T05:02:41.4957534Z ---------------------------------------------------------------------- 2022-05-18T05:02:41.4957873Z Ran 1 test in 2.958s 2022-05-18T05:02:41.4958039Z 2022-05-18T05:02:41.4958131Z OK 2022-05-18T05:02:41.4958265Z 2022-05-18T05:02:41.4958398Z Generating XML reports... 2022-05-18T05:02:41.5001418Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050238.xml 2022-05-18T05:02:42.6356806Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:42.6369637Z 2022-05-18T05:02:42.6369781Z Running tests... 2022-05-18T05:02:42.6370538Z ---------------------------------------------------------------------- 2022-05-18T05:02:44.1813210Z test_collectives_op_mismatch_cuda (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:44.2215511Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99440 2022-05-18T05:02:44.2321401Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99441 2022-05-18T05:02:44.2431637Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99442 2022-05-18T05:02:44.2542378Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99443 2022-05-18T05:02:45.1367345Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:45.1398261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:45.1468477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:45.2054476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:45.3588390Z skip: Need at least 4 CUDA devices (2.721s) 2022-05-18T05:02:45.3588654Z 2022-05-18T05:02:45.3589255Z ---------------------------------------------------------------------- 2022-05-18T05:02:45.3589599Z Ran 1 test in 2.722s 2022-05-18T05:02:45.3589763Z 2022-05-18T05:02:45.3589872Z OK (skipped=1) 2022-05-18T05:02:45.3590033Z 2022-05-18T05:02:45.3590163Z Generating XML reports... 2022-05-18T05:02:45.3632887Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050242.xml 2022-05-18T05:02:46.5100651Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:46.5115260Z 2022-05-18T05:02:46.5115512Z Running tests... 2022-05-18T05:02:46.5115981Z ---------------------------------------------------------------------- 2022-05-18T05:02:48.0820768Z test_collectives_op_mismatch_cuda_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:48.1218069Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99611 2022-05-18T05:02:48.1327768Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99612 2022-05-18T05:02:48.1437756Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99613 2022-05-18T05:02:48.1547187Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99614 2022-05-18T05:02:49.0788399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:49.0804464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:49.1034512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:49.1111387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:49.2593003Z skip: Need at least 4 CUDA devices (2.747s) 2022-05-18T05:02:49.2593324Z 2022-05-18T05:02:49.2593726Z ---------------------------------------------------------------------- 2022-05-18T05:02:49.2594064Z Ran 1 test in 2.748s 2022-05-18T05:02:49.2594230Z 2022-05-18T05:02:49.2594322Z OK (skipped=1) 2022-05-18T05:02:49.2594484Z 2022-05-18T05:02:49.2594708Z Generating XML reports... 2022-05-18T05:02:49.2638986Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050246.xml 2022-05-18T05:02:50.3977884Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:50.3992513Z 2022-05-18T05:02:50.3992949Z Running tests... 2022-05-18T05:02:50.3993775Z ---------------------------------------------------------------------- 2022-05-18T05:02:51.9869612Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupGlooWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:52.0269073Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99782 2022-05-18T05:02:52.0376598Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99783 2022-05-18T05:02:52.0486264Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99784 2022-05-18T05:02:52.0597346Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99785 2022-05-18T05:02:53.0318316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:53.0424068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:02:53.0493596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:53.0718695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:02:53.1249647Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:53.1351631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:53.1453948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:02:53.1454467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:02:53.1455277Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:53.1456167Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:53.1456866Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:53.1556185Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:02:53.2279207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:02:53.2380207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:02:53.2380734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3 2022-05-18T05:02:53.2381433Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2 2022-05-18T05:02:53.2382149Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:53.2383373Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:53.2384450Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:53.2385163Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes. 2022-05-18T05:02:53.5651454Z ok (3.166s) 2022-05-18T05:02:53.5651991Z 2022-05-18T05:02:53.5652435Z ---------------------------------------------------------------------- 2022-05-18T05:02:53.5652787Z Ran 1 test in 3.166s 2022-05-18T05:02:53.5652953Z 2022-05-18T05:02:53.5653048Z OK 2022-05-18T05:02:53.5653182Z 2022-05-18T05:02:53.5653338Z Generating XML reports... 2022-05-18T05:02:53.5697036Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050250.xml 2022-05-18T05:02:54.7186713Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:54.7200978Z 2022-05-18T05:02:54.7201441Z Running tests... 2022-05-18T05:02:54.7201862Z ---------------------------------------------------------------------- 2022-05-18T05:02:56.3014275Z test_collective_hang (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:56.3412891Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100001 2022-05-18T05:02:56.3518954Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100002 2022-05-18T05:02:57.2430702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:57.2432627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:57.2443913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:57.2447542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:57.2448339Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:57.2535866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:57.2847753Z [E ProcessGroupGloo.cpp:2791] [Rank 0]: Rank 1 failed to pass monitoredBarrier in 2000 ms 2022-05-18T05:02:57.2848243Z [E ProcessGroupGloo.cpp:136] [Rank 0]: Ranks 1 failed to pass monitoredBarrier in 2000 ms 2022-05-18T05:02:57.4560895Z ok (2.736s) 2022-05-18T05:02:57.4561111Z 2022-05-18T05:02:57.4561512Z ---------------------------------------------------------------------- 2022-05-18T05:02:57.4561855Z Ran 1 test in 2.736s 2022-05-18T05:02:57.4562023Z 2022-05-18T05:02:57.4562119Z OK 2022-05-18T05:02:57.4562254Z 2022-05-18T05:02:57.4562370Z Generating XML reports... 2022-05-18T05:02:57.4604569Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050254.xml 2022-05-18T05:02:58.6034910Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:02:58.6048760Z 2022-05-18T05:02:58.6048988Z Running tests... 2022-05-18T05:02:58.6049427Z ---------------------------------------------------------------------- 2022-05-18T05:03:00.1825221Z test_collective_shape_mismatch (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:00.2228313Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100114 2022-05-18T05:03:00.2339964Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100115 2022-05-18T05:03:01.1106491Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:01.1108944Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:01.1235959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:01.1240064Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:01.1241515Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:01.1313754Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:02.8414601Z ok (4.236s) 2022-05-18T05:03:02.8414822Z 2022-05-18T05:03:02.8415230Z ---------------------------------------------------------------------- 2022-05-18T05:03:02.8415555Z Ran 1 test in 4.237s 2022-05-18T05:03:02.8415742Z 2022-05-18T05:03:02.8415834Z OK 2022-05-18T05:03:02.8415973Z 2022-05-18T05:03:02.8416105Z Generating XML reports... 2022-05-18T05:03:02.8458434Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050258.xml 2022-05-18T05:03:04.0071168Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:03:04.0085196Z 2022-05-18T05:03:04.0085434Z Running tests... 2022-05-18T05:03:04.0085862Z ---------------------------------------------------------------------- 2022-05-18T05:03:05.5985925Z test_collective_shape_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:05.6394533Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100243 2022-05-18T05:03:05.6504917Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100244 2022-05-18T05:03:06.5341785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:06.5686653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:06.5857203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:06.5857770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:06.5858584Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:06.5859285Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:06.5965525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:03:06.5966038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:03:06.5966718Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:03:06.5967925Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:03:08.3583656Z ok (4.349s) 2022-05-18T05:03:08.3583989Z 2022-05-18T05:03:08.3584719Z ---------------------------------------------------------------------- 2022-05-18T05:03:08.3585074Z Ran 1 test in 4.350s 2022-05-18T05:03:08.3585244Z 2022-05-18T05:03:08.3585340Z OK 2022-05-18T05:03:08.3585474Z 2022-05-18T05:03:08.3585592Z Generating XML reports... 2022-05-18T05:03:08.3627920Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050304.xml 2022-05-18T05:03:09.5387568Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:03:09.5402280Z 2022-05-18T05:03:09.5402555Z Running tests... 2022-05-18T05:03:09.5403303Z ---------------------------------------------------------------------- 2022-05-18T05:03:11.1268718Z test_collectives_op_mismatch (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:11.1680081Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100378 2022-05-18T05:03:11.1791015Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100379 2022-05-18T05:03:12.0693408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:12.0695919Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:12.0707052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:12.0711278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:12.0712395Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:12.0799184Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:13.7865358Z ok (4.246s) 2022-05-18T05:03:13.7865570Z 2022-05-18T05:03:13.7865984Z ---------------------------------------------------------------------- 2022-05-18T05:03:13.7866680Z Ran 1 test in 4.246s 2022-05-18T05:03:13.7866854Z 2022-05-18T05:03:13.7866953Z OK 2022-05-18T05:03:13.7867089Z 2022-05-18T05:03:13.7867207Z Generating XML reports... 2022-05-18T05:03:13.7909580Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050309.xml 2022-05-18T05:03:14.9557632Z Test results will be stored in test-reports/python-unittest/distributed.test_pg_wrapper 2022-05-18T05:03:14.9571503Z 2022-05-18T05:03:14.9571645Z Running tests... 2022-05-18T05:03:14.9572447Z ---------------------------------------------------------------------- 2022-05-18T05:03:16.5615050Z test_collectives_op_mismatch_debug_mode (__main__.ProcessGroupNCCLWrapperTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:16.6011690Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100507 2022-05-18T05:03:16.6120651Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100508 2022-05-18T05:03:17.5041439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:17.5375551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:17.5587560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:17.5588199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:17.5589030Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:17.5589726Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:17.5796172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:03:17.5796928Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:03:17.5797611Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:03:17.5798378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:03:19.3197444Z ok (4.362s) 2022-05-18T05:03:19.3197863Z 2022-05-18T05:03:19.3198276Z ---------------------------------------------------------------------- 2022-05-18T05:03:19.3198621Z Ran 1 test in 4.363s 2022-05-18T05:03:19.3198788Z 2022-05-18T05:03:19.3198865Z OK 2022-05-18T05:03:19.3199297Z 2022-05-18T05:03:19.3199451Z Generating XML reports... 2022-05-18T05:03:19.3241178Z Generated XML report: test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050314.xml 2022-05-18T05:03:19.7279685Z Running distributed/fsdp/test_fsdp_grad_acc ... [2022-05-18 05:03:19.727475] 2022-05-18T05:03:19.7280434Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_grad_acc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:03:19.727588] 2022-05-18T05:03:20.6542887Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc 2022-05-18T05:03:20.6560278Z 2022-05-18T05:03:20.6560567Z Running tests... 2022-05-18T05:03:20.6560995Z ---------------------------------------------------------------------- 2022-05-18T05:03:20.6575532Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST (__main__.TestGradAcc) 2022-05-18T05:03:22.2349835Z Tests gradient accumulation. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:22.2756293Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100642 2022-05-18T05:03:22.2870852Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100643 2022-05-18T05:03:23.1983257Z dist init r=0, world=2 2022-05-18T05:03:23.1986666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:23.2259187Z dist init r=1, world=2 2022-05-18T05:03:23.2263631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:23.2264707Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:23.2293001Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:24.6044739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:24.6045261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:24.6333165Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:24.6334025Z warnings.warn( 2022-05-18T05:03:24.6334803Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:24.6335321Z warnings.warn( 2022-05-18T05:03:25.1952945Z ok (4.539s) 2022-05-18T05:03:25.1967524Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE (__main__.TestGradAcc) 2022-05-18T05:03:25.2103479Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100725 2022-05-18T05:03:25.2212054Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100726 2022-05-18T05:03:26.1514586Z dist init r=1, world=2 2022-05-18T05:03:26.1517747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:26.1835291Z dist init r=0, world=2 2022-05-18T05:03:26.1839532Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:26.1840663Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:26.1926486Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:27.5401981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:27.5402516Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:27.5693259Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:27.5693863Z warnings.warn( 2022-05-18T05:03:27.5694608Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:27.5695156Z warnings.warn( 2022-05-18T05:03:28.2291809Z ok (3.034s) 2022-05-18T05:03:28.2306626Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_None (__main__.TestGradAcc) 2022-05-18T05:03:28.2443791Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100808 2022-05-18T05:03:28.2554792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100809 2022-05-18T05:03:29.1598856Z dist init r=1, world=2 2022-05-18T05:03:29.1601690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:29.1992845Z dist init r=0, world=2 2022-05-18T05:03:29.1997116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:29.1998267Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:29.2009034Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:30.5306685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:30.5307213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:30.5612426Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:30.5613137Z warnings.warn( 2022-05-18T05:03:30.5613892Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:30.5614439Z warnings.warn( 2022-05-18T05:03:31.1631876Z ok (2.934s) 2022-05-18T05:03:31.1646047Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST (__main__.TestGradAcc) 2022-05-18T05:03:31.1779380Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100891 2022-05-18T05:03:31.1888582Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100892 2022-05-18T05:03:32.1097874Z dist init r=0, world=2 2022-05-18T05:03:32.1100514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:32.1213499Z dist init r=1, world=2 2022-05-18T05:03:32.1217810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:32.1219289Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:32.1305918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:33.4614988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:33.4615532Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:33.7958449Z ok (2.633s) 2022-05-18T05:03:33.7973215Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE (__main__.TestGradAcc) 2022-05-18T05:03:33.8116884Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100970 2022-05-18T05:03:33.8227492Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100971 2022-05-18T05:03:34.7397970Z dist init r=0, world=2 2022-05-18T05:03:34.7401115Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:34.7678607Z dist init r=1, world=2 2022-05-18T05:03:34.7682804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:34.7683758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:34.7707488Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:36.0872551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:36.0873120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:36.3294748Z ok (2.533s) 2022-05-18T05:03:36.3309126Z test_grad_acc_configs_[(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_None (__main__.TestGradAcc) 2022-05-18T05:03:36.3441592Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101049 2022-05-18T05:03:36.3550572Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101050 2022-05-18T05:03:37.2700458Z dist init r=1, world=2 2022-05-18T05:03:37.2703450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:37.2910826Z dist init r=0, world=2 2022-05-18T05:03:37.2915821Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:37.2916629Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:37.3009478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:38.6449666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:38.6450241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:38.9619623Z ok (2.632s) 2022-05-18T05:03:38.9634348Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST (__main__.TestGradAcc) 2022-05-18T05:03:38.9765705Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101128 2022-05-18T05:03:38.9873739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101129 2022-05-18T05:03:39.9056971Z dist init r=1, world=2 2022-05-18T05:03:39.9060301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:39.9075921Z dist init r=0, world=2 2022-05-18T05:03:39.9080038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:39.9081011Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:39.9163755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:41.2607222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:41.2607765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:41.2892874Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:41.2893480Z warnings.warn( 2022-05-18T05:03:41.2897132Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:41.2897683Z warnings.warn( 2022-05-18T05:03:41.8949923Z ok (2.933s) 2022-05-18T05:03:41.8964402Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE (__main__.TestGradAcc) 2022-05-18T05:03:41.9097808Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101211 2022-05-18T05:03:41.9207299Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101212 2022-05-18T05:03:42.8435681Z dist init r=0, world=2 2022-05-18T05:03:42.8438941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:42.8792946Z dist init r=1, world=2 2022-05-18T05:03:42.8797745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:42.8798910Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:42.8847147Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:44.2256868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:44.2257390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:44.2532388Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:44.2532971Z warnings.warn( 2022-05-18T05:03:44.2539085Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:44.2539634Z warnings.warn( 2022-05-18T05:03:44.8283069Z ok (2.933s) 2022-05-18T05:03:44.8297021Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_None (__main__.TestGradAcc) 2022-05-18T05:03:44.8429781Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101294 2022-05-18T05:03:44.8536718Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101295 2022-05-18T05:03:45.8114679Z dist init r=1, world=2 2022-05-18T05:03:45.8117850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:45.8166834Z dist init r=0, world=2 2022-05-18T05:03:45.8170718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:45.8171817Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:45.8221188Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:47.1633996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:47.1634562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:47.1933289Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:47.1933890Z warnings.warn( 2022-05-18T05:03:47.1934638Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:03:47.1935481Z warnings.warn( 2022-05-18T05:03:47.7613232Z ok (2.933s) 2022-05-18T05:03:47.7628467Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST (__main__.TestGradAcc) 2022-05-18T05:03:47.7761078Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101377 2022-05-18T05:03:47.7871914Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101378 2022-05-18T05:03:48.7194819Z dist init r=1, world=2 2022-05-18T05:03:48.7198036Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:48.7454241Z dist init r=0, world=2 2022-05-18T05:03:48.7458504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:48.7459635Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:48.7504829Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:50.0794362Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:50.0794900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:50.3941227Z ok (2.633s) 2022-05-18T05:03:50.3955999Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE (__main__.TestGradAcc) 2022-05-18T05:03:50.4084301Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101456 2022-05-18T05:03:50.4192686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101457 2022-05-18T05:03:51.3373780Z dist init r=0, world=2 2022-05-18T05:03:51.3376406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:51.3400645Z dist init r=1, world=2 2022-05-18T05:03:51.3405159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:51.3406447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:51.3479988Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:52.6894239Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:52.6894778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:52.9260675Z ok (2.532s) 2022-05-18T05:03:52.9274669Z test_grad_acc_configs_[(use_no_sync=True,num_iters=3),(use_no_sync=False,num_iters=3),(use_no_sync=True,num_iters=3)]_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_None (__main__.TestGradAcc) 2022-05-18T05:03:52.9406372Z Tests gradient accumulation. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101535 2022-05-18T05:03:52.9514129Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101536 2022-05-18T05:03:53.9154536Z dist init r=1, world=2 2022-05-18T05:03:53.9157438Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:53.9248211Z dist init r=0, world=2 2022-05-18T05:03:53.9252539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:53.9253599Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:53.9260300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:55.2379564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:55.2380104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:55.5583832Z ok (2.632s) 2022-05-18T05:03:55.5584037Z 2022-05-18T05:03:55.5584495Z ---------------------------------------------------------------------- 2022-05-18T05:03:55.5584826Z Ran 12 tests in 34.902s 2022-05-18T05:03:55.5585002Z 2022-05-18T05:03:55.5585101Z OK 2022-05-18T05:03:55.5585232Z 2022-05-18T05:03:55.5585386Z Generating XML reports... 2022-05-18T05:03:55.5658496Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc/TEST-TestGradAcc-20220518050320.xml 2022-05-18T05:03:55.8352525Z Running distributed/test_c10d_spawn_gloo ... [2022-05-18 05:03:55.834794] 2022-05-18T05:03:55.8353471Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_spawn_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:03:55.834898] 2022-05-18T05:03:56.7454143Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4fvxhe0l 2022-05-18T05:03:56.7455589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4fvxhe0l/_remote_module_non_scriptable.py 2022-05-18T05:03:58.3245084Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:58.3286797Z , <__main__.DistributedDataParallelSingleProcessTest testMethod=test_cuda>, <__main__.DistributedDataParallelSingleProcessTest testMethod=test_rnn>]> 2022-05-18T05:03:58.3288279Z test_cpu (__main__.DistributedDataParallelSingleProcessTest) 2022-05-18T05:03:58.3288982Z test_cuda (__main__.DistributedDataParallelSingleProcessTest) 2022-05-18T05:03:58.3289650Z test_rnn (__main__.DistributedDataParallelSingleProcessTest) 2022-05-18T05:03:58.3290053Z 2022-05-18T05:03:58.3290378Z 2022-05-18T05:03:58.3292929Z , <__main__.TestDistributedNNFunctionsGloo testMethod=test_all_to_all>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_all_to_all_single>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_allreduce>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_broadcast>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_gather>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_reduce>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_scatter>]> 2022-05-18T05:03:58.3294132Z test_all_gather (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3294516Z test_all_to_all (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3294936Z test_all_to_all_single (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3295349Z test_allreduce (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3295733Z test_broadcast (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3296125Z test_gather (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3296515Z test_reduce (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:58.3296884Z test_scatter (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:03:59.2093282Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb0urcqc2 2022-05-18T05:03:59.2094143Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb0urcqc2/_remote_module_non_scriptable.py 2022-05-18T05:04:00.7567379Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:00.7628319Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:00.7646416Z 2022-05-18T05:04:00.7646605Z Running tests... 2022-05-18T05:04:00.7647018Z ---------------------------------------------------------------------- 2022-05-18T05:04:00.7722551Z test_cpu (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:04:00.7876323Z ok (0.023s) 2022-05-18T05:04:00.7878062Z 2022-05-18T05:04:00.7878910Z ---------------------------------------------------------------------- 2022-05-18T05:04:00.7879302Z Ran 1 test in 0.023s 2022-05-18T05:04:00.7879454Z 2022-05-18T05:04:00.7879563Z OK 2022-05-18T05:04:00.7879697Z 2022-05-18T05:04:00.7879828Z Generating XML reports... 2022-05-18T05:04:00.7915305Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518050400.xml 2022-05-18T05:04:01.9336205Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl_2ua5yj 2022-05-18T05:04:01.9337357Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl_2ua5yj/_remote_module_non_scriptable.py 2022-05-18T05:04:03.5356096Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:03.5418178Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:03.5435207Z 2022-05-18T05:04:03.5435685Z Running tests... 2022-05-18T05:04:03.5436155Z ---------------------------------------------------------------------- 2022-05-18T05:04:03.7174384Z test_cuda (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:04:03.7392502Z ok (0.196s) 2022-05-18T05:04:03.7394032Z 2022-05-18T05:04:03.7394757Z ---------------------------------------------------------------------- 2022-05-18T05:04:03.7395399Z Ran 1 test in 0.196s 2022-05-18T05:04:03.7395702Z 2022-05-18T05:04:03.7395846Z OK 2022-05-18T05:04:03.7396085Z 2022-05-18T05:04:03.7396318Z Generating XML reports... 2022-05-18T05:04:03.7433243Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518050403.xml 2022-05-18T05:04:04.8922229Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplmdz7pzm 2022-05-18T05:04:04.8923130Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplmdz7pzm/_remote_module_non_scriptable.py 2022-05-18T05:04:06.4647590Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:06.4707479Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:06.4722836Z 2022-05-18T05:04:06.4723036Z Running tests... 2022-05-18T05:04:06.4723449Z ---------------------------------------------------------------------- 2022-05-18T05:04:07.2950640Z test_rnn (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:04:07.7747420Z ok (1.302s) 2022-05-18T05:04:07.7748432Z 2022-05-18T05:04:07.7748930Z ---------------------------------------------------------------------- 2022-05-18T05:04:07.7749277Z Ran 1 test in 1.303s 2022-05-18T05:04:07.7749465Z 2022-05-18T05:04:07.7749562Z OK 2022-05-18T05:04:07.7749697Z 2022-05-18T05:04:07.7749826Z Generating XML reports... 2022-05-18T05:04:07.7784480Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518050406.xml 2022-05-18T05:04:08.9753564Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp080qfewc 2022-05-18T05:04:08.9754382Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp080qfewc/_remote_module_non_scriptable.py 2022-05-18T05:04:10.5504009Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:10.5565376Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:10.5580933Z 2022-05-18T05:04:10.5581320Z Running tests... 2022-05-18T05:04:10.5581831Z ---------------------------------------------------------------------- 2022-05-18T05:04:10.5949477Z test_all_gather (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101800 2022-05-18T05:04:10.6049212Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101801 2022-05-18T05:04:11.5094702Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplweh_dbb 2022-05-18T05:04:11.5095923Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplweh_dbb/_remote_module_non_scriptable.py 2022-05-18T05:04:11.5133644Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm156bvhv 2022-05-18T05:04:11.5136458Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm156bvhv/_remote_module_non_scriptable.py 2022-05-18T05:04:13.1433824Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:13.1473565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:13.1582281Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:13.1622321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:13.1833887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:13.1834646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:13.1835449Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:13.1836127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:14.6154507Z ok (4.057s) 2022-05-18T05:04:14.6154856Z 2022-05-18T05:04:14.6155640Z ---------------------------------------------------------------------- 2022-05-18T05:04:14.6156139Z Ran 1 test in 4.057s 2022-05-18T05:04:14.6156314Z 2022-05-18T05:04:14.6156410Z OK 2022-05-18T05:04:14.6156544Z 2022-05-18T05:04:14.6156682Z Generating XML reports... 2022-05-18T05:04:14.6198523Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050410.xml 2022-05-18T05:04:15.7745080Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgig2puur 2022-05-18T05:04:15.7746250Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgig2puur/_remote_module_non_scriptable.py 2022-05-18T05:04:17.3272894Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:17.3332515Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:17.3347661Z 2022-05-18T05:04:17.3348100Z Running tests... 2022-05-18T05:04:17.3349038Z ---------------------------------------------------------------------- 2022-05-18T05:04:17.3710014Z test_all_to_all (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101916 2022-05-18T05:04:17.3809424Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101917 2022-05-18T05:04:18.2699849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwi0_zhn 2022-05-18T05:04:18.2701178Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwi0_zhn/_remote_module_non_scriptable.py 2022-05-18T05:04:18.2725851Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjfw62fo 2022-05-18T05:04:18.2728733Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjfw62fo/_remote_module_non_scriptable.py 2022-05-18T05:04:19.9046058Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:19.9086380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:19.9214512Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:19.9254215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:19.9398795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:19.9399724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:19.9400508Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:19.9401204Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:21.2912887Z ok (3.956s) 2022-05-18T05:04:21.2913256Z 2022-05-18T05:04:21.2913756Z ---------------------------------------------------------------------- 2022-05-18T05:04:21.2914107Z Ran 1 test in 3.956s 2022-05-18T05:04:21.2914292Z 2022-05-18T05:04:21.2914387Z OK 2022-05-18T05:04:21.2914503Z 2022-05-18T05:04:21.2914636Z Generating XML reports... 2022-05-18T05:04:21.2957276Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050417.xml 2022-05-18T05:04:22.4768247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpui16rrja 2022-05-18T05:04:22.4769380Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpui16rrja/_remote_module_non_scriptable.py 2022-05-18T05:04:24.0524108Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:24.0583862Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:24.0599410Z 2022-05-18T05:04:24.0599685Z Running tests... 2022-05-18T05:04:24.0961483Z ---------------------------------------------------------------------- 2022-05-18T05:04:24.0962106Z test_all_to_all_single (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102032 2022-05-18T05:04:24.1061288Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102033 2022-05-18T05:04:24.9891613Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3x4jmevu 2022-05-18T05:04:24.9892902Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3x4jmevu/_remote_module_non_scriptable.py 2022-05-18T05:04:24.9912015Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqxeyy97o 2022-05-18T05:04:24.9914924Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqxeyy97o/_remote_module_non_scriptable.py 2022-05-18T05:04:26.6248850Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:26.6288944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:26.6447371Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:26.6487668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:26.6601777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:26.6602499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:26.6603362Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:26.6604079Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:28.0165906Z ok (3.956s) 2022-05-18T05:04:28.0166175Z 2022-05-18T05:04:28.0166769Z ---------------------------------------------------------------------- 2022-05-18T05:04:28.0167121Z Ran 1 test in 3.957s 2022-05-18T05:04:28.0167286Z 2022-05-18T05:04:28.0167382Z OK 2022-05-18T05:04:28.0167527Z 2022-05-18T05:04:28.0167661Z Generating XML reports... 2022-05-18T05:04:28.0209746Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050424.xml 2022-05-18T05:04:29.1907377Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcir1lwxh 2022-05-18T05:04:29.1908860Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcir1lwxh/_remote_module_non_scriptable.py 2022-05-18T05:04:30.7640540Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:30.7700378Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:30.7715998Z 2022-05-18T05:04:30.7716141Z Running tests... 2022-05-18T05:04:30.7716816Z ---------------------------------------------------------------------- 2022-05-18T05:04:30.8083745Z test_allreduce (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102148 2022-05-18T05:04:30.8184761Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102149 2022-05-18T05:04:31.7461538Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfaqocrw5 2022-05-18T05:04:31.7462818Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfaqocrw5/_remote_module_non_scriptable.py 2022-05-18T05:04:31.7736472Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmped4axwu0 2022-05-18T05:04:31.7739595Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmped4axwu0/_remote_module_non_scriptable.py 2022-05-18T05:04:33.3561855Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:33.3601228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:33.3984010Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:33.4024232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:33.4217698Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:33.4218244Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:33.4219042Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:33.4219718Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:34.8291395Z ok (4.057s) 2022-05-18T05:04:34.8291615Z 2022-05-18T05:04:34.8291994Z ---------------------------------------------------------------------- 2022-05-18T05:04:34.8292343Z Ran 1 test in 4.057s 2022-05-18T05:04:34.8292511Z 2022-05-18T05:04:34.8292608Z OK 2022-05-18T05:04:34.8292744Z 2022-05-18T05:04:34.8292876Z Generating XML reports... 2022-05-18T05:04:34.8336394Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050430.xml 2022-05-18T05:04:36.0039960Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgb58p5hk 2022-05-18T05:04:36.0040614Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgb58p5hk/_remote_module_non_scriptable.py 2022-05-18T05:04:37.5456017Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:37.5516636Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:37.5532515Z 2022-05-18T05:04:37.5532861Z Running tests... 2022-05-18T05:04:37.5533363Z ---------------------------------------------------------------------- 2022-05-18T05:04:37.5895787Z test_broadcast (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102264 2022-05-18T05:04:37.5996846Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102265 2022-05-18T05:04:38.4938323Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcaegd76i 2022-05-18T05:04:38.4940016Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcaegd76i/_remote_module_non_scriptable.py 2022-05-18T05:04:38.5047633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqw0oet09 2022-05-18T05:04:38.5050065Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqw0oet09/_remote_module_non_scriptable.py 2022-05-18T05:04:40.1343068Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:40.1375747Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:40.1385037Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:40.1414379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:40.1624631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:40.1625147Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:40.1625962Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:40.1626660Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:41.5112793Z ok (3.958s) 2022-05-18T05:04:41.5113016Z 2022-05-18T05:04:41.5113413Z ---------------------------------------------------------------------- 2022-05-18T05:04:41.5113782Z Ran 1 test in 3.958s 2022-05-18T05:04:41.5113951Z 2022-05-18T05:04:41.5114027Z OK 2022-05-18T05:04:41.5114161Z 2022-05-18T05:04:41.5114293Z Generating XML reports... 2022-05-18T05:04:41.5156639Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050437.xml 2022-05-18T05:04:42.6801406Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn47haqr9 2022-05-18T05:04:42.6802312Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn47haqr9/_remote_module_non_scriptable.py 2022-05-18T05:04:44.2309771Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:44.2373375Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:44.2388449Z 2022-05-18T05:04:44.2388707Z Running tests... 2022-05-18T05:04:44.2389132Z ---------------------------------------------------------------------- 2022-05-18T05:04:44.2763234Z test_gather (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102380 2022-05-18T05:04:44.2865380Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102381 2022-05-18T05:04:45.1693788Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbupmi3wc 2022-05-18T05:04:45.1694912Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbupmi3wc/_remote_module_non_scriptable.py 2022-05-18T05:04:45.1772107Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsybrrsd0 2022-05-18T05:04:45.1775391Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsybrrsd0/_remote_module_non_scriptable.py 2022-05-18T05:04:46.8003599Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:46.8042491Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:46.8080280Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:46.8120794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:46.8253990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:46.8254515Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:46.8255284Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:46.8255981Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:48.1977953Z ok (3.959s) 2022-05-18T05:04:48.1978151Z 2022-05-18T05:04:48.1978555Z ---------------------------------------------------------------------- 2022-05-18T05:04:48.1978892Z Ran 1 test in 3.959s 2022-05-18T05:04:48.1979064Z 2022-05-18T05:04:48.1979159Z OK 2022-05-18T05:04:48.1979279Z 2022-05-18T05:04:48.1979409Z Generating XML reports... 2022-05-18T05:04:48.2022080Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050444.xml 2022-05-18T05:04:49.3759930Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph3b8ftd7 2022-05-18T05:04:49.3761246Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph3b8ftd7/_remote_module_non_scriptable.py 2022-05-18T05:04:50.9525709Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:50.9583996Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:50.9599273Z 2022-05-18T05:04:50.9599584Z Running tests... 2022-05-18T05:04:50.9600003Z ---------------------------------------------------------------------- 2022-05-18T05:04:50.9962022Z test_reduce (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102496 2022-05-18T05:04:51.0062340Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102497 2022-05-18T05:04:51.8866792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5020r_fh 2022-05-18T05:04:51.8868790Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5020r_fh/_remote_module_non_scriptable.py 2022-05-18T05:04:51.9225349Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppjvfs3tr 2022-05-18T05:04:51.9228259Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppjvfs3tr/_remote_module_non_scriptable.py 2022-05-18T05:04:53.5496890Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:53.5530046Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:53.5538080Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:53.5568623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:53.5750219Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:53.5750750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:53.5751517Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:53.5752213Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:55.0169321Z ok (4.057s) 2022-05-18T05:04:55.0169539Z 2022-05-18T05:04:55.0169951Z ---------------------------------------------------------------------- 2022-05-18T05:04:55.0170580Z Ran 1 test in 4.057s 2022-05-18T05:04:55.0170772Z 2022-05-18T05:04:55.0170870Z OK 2022-05-18T05:04:55.0171007Z 2022-05-18T05:04:55.0171146Z Generating XML reports... 2022-05-18T05:04:55.0214012Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050450.xml 2022-05-18T05:04:56.2023395Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdst_88gp 2022-05-18T05:04:56.2024721Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdst_88gp/_remote_module_non_scriptable.py 2022-05-18T05:04:57.7747821Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:57.7808940Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:04:57.7825084Z 2022-05-18T05:04:57.7825369Z Running tests... 2022-05-18T05:04:57.7825806Z ---------------------------------------------------------------------- 2022-05-18T05:04:57.8210521Z test_scatter (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102612 2022-05-18T05:04:57.8312427Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102613 2022-05-18T05:04:58.7020002Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpayz3yvcb 2022-05-18T05:04:58.7021024Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpayz3yvcb/_remote_module_non_scriptable.py 2022-05-18T05:04:58.7447140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8tc1nt69 2022-05-18T05:04:58.7450073Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8tc1nt69/_remote_module_non_scriptable.py 2022-05-18T05:05:00.3204727Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:00.3245019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:00.3304476Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:00.3343638Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:00.3453688Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:00.3454802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:00.3455761Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:00.3456471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:01.7417852Z ok (3.959s) 2022-05-18T05:05:01.7418219Z 2022-05-18T05:05:01.7418907Z ---------------------------------------------------------------------- 2022-05-18T05:05:01.7419626Z Ran 1 test in 3.959s 2022-05-18T05:05:01.7419917Z 2022-05-18T05:05:01.7420016Z OK 2022-05-18T05:05:01.7420156Z 2022-05-18T05:05:01.7420292Z Generating XML reports... 2022-05-18T05:05:01.7461471Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050457.xml 2022-05-18T05:05:02.3031449Z Running distributed/fsdp/test_fsdp_comm ... [2022-05-18 05:05:02.302609] 2022-05-18T05:05:02.3032311Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_comm.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:05:02.302714] 2022-05-18T05:05:03.2335003Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_comm 2022-05-18T05:05:03.2353678Z 2022-05-18T05:05:03.2354151Z Running tests... 2022-05-18T05:05:03.2354643Z ---------------------------------------------------------------------- 2022-05-18T05:05:03.2378384Z test_communication_nested_model_False_use_no_sync_False_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:05:04.8277446Z Tests FSDP's communication cost in terms of calls to collective ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:04.8678322Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102728 2022-05-18T05:05:04.8789529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102729 2022-05-18T05:05:05.7874164Z dist init r=0, world=2 2022-05-18T05:05:05.7877588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:05.8285051Z dist init r=1, world=2 2022-05-18T05:05:05.8289717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:05.8290692Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:05.8387876Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:07.1685200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:07.1685746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:07.1975691Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:07.1976275Z warnings.warn( 2022-05-18T05:05:07.1978670Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:07.1979214Z warnings.warn( 2022-05-18T05:05:08.2882442Z ok (5.053s) 2022-05-18T05:05:08.2907056Z test_communication_nested_model_False_use_no_sync_False_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:05:08.3038527Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102811 2022-05-18T05:05:08.3148932Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102812 2022-05-18T05:05:09.2212478Z dist init r=1, world=2 2022-05-18T05:05:09.2216284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:09.2551670Z dist init r=0, world=2 2022-05-18T05:05:09.2556735Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:09.2557753Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:09.2624447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:10.6004580Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:10.6005135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:10.6297489Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:10.6298162Z warnings.warn( 2022-05-18T05:05:10.6298914Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:10.6299450Z warnings.warn( 2022-05-18T05:05:11.7237167Z ok (3.435s) 2022-05-18T05:05:11.7260866Z test_communication_nested_model_False_use_no_sync_True_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:05:11.7390599Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102894 2022-05-18T05:05:11.7497881Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102895 2022-05-18T05:05:12.7443986Z dist init r=1, world=2 2022-05-18T05:05:12.7447595Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:12.7454034Z dist init r=0, world=2 2022-05-18T05:05:12.7458996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:12.7460138Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:12.7551045Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:14.0918850Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:14.0919449Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:14.1214971Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:14.1215537Z warnings.warn( 2022-05-18T05:05:14.1216424Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:14.1217284Z warnings.warn( 2022-05-18T05:05:15.2589869Z ok (3.535s) 2022-05-18T05:05:15.2613330Z test_communication_nested_model_False_use_no_sync_True_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:05:15.2745335Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102977 2022-05-18T05:05:15.2852878Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102978 2022-05-18T05:05:16.1958374Z dist init r=1, world=2 2022-05-18T05:05:16.1961198Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:16.2030113Z dist init r=0, world=2 2022-05-18T05:05:16.2035183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:16.2036332Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:16.2064338Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:17.5399931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:17.5400476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:17.5695268Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:17.5695854Z warnings.warn( 2022-05-18T05:05:17.5698042Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:17.5698609Z warnings.warn( 2022-05-18T05:05:18.6939804Z ok (3.435s) 2022-05-18T05:05:18.6963360Z test_communication_nested_model_True_use_no_sync_False_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:05:18.7094978Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103060 2022-05-18T05:05:18.7202154Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103061 2022-05-18T05:05:19.6303997Z dist init r=1, world=2 2022-05-18T05:05:19.6307404Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:19.6396270Z dist init r=0, world=2 2022-05-18T05:05:19.6401163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:19.6402294Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:19.6410216Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:20.9999654Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:21.0000182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:21.0206051Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:21.0206628Z warnings.warn( 2022-05-18T05:05:21.0207396Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:21.0208244Z warnings.warn( 2022-05-18T05:05:21.5275332Z ok (2.833s) 2022-05-18T05:05:21.5299514Z test_communication_nested_model_True_use_no_sync_False_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:05:21.5433690Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103143 2022-05-18T05:05:21.5540094Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103144 2022-05-18T05:05:22.4618952Z dist init r=1, world=2 2022-05-18T05:05:22.4622304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:22.5084408Z dist init r=0, world=2 2022-05-18T05:05:22.5088928Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:22.5089787Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:22.5132265Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:23.8456561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:23.8457100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:23.8645610Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:23.8646178Z warnings.warn( 2022-05-18T05:05:23.8682253Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:23.8682814Z warnings.warn( 2022-05-18T05:05:24.3613617Z ok (2.834s) 2022-05-18T05:05:24.3637760Z test_communication_nested_model_True_use_no_sync_True_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:05:24.3765853Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103226 2022-05-18T05:05:24.3873466Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103227 2022-05-18T05:05:25.2972399Z dist init r=1, world=2 2022-05-18T05:05:25.2975498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:25.3334830Z dist init r=0, world=2 2022-05-18T05:05:25.3339041Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:25.3340177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:25.3384068Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:26.6659131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:26.6659660Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:26.6886005Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:26.6886584Z warnings.warn( 2022-05-18T05:05:26.6921449Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:26.6922307Z warnings.warn( 2022-05-18T05:05:27.1945652Z ok (2.833s) 2022-05-18T05:05:27.1969361Z test_communication_nested_model_True_use_no_sync_True_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:05:27.2098486Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103309 2022-05-18T05:05:27.2206177Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103310 2022-05-18T05:05:28.1253219Z dist init r=1, world=2 2022-05-18T05:05:28.1256415Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:28.1626900Z dist init r=0, world=2 2022-05-18T05:05:28.1632108Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:28.1633228Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:28.1664442Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:29.5181377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:29.5181902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:29.5405585Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:29.5406164Z warnings.warn( 2022-05-18T05:05:29.5441798Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:29.5442359Z warnings.warn( 2022-05-18T05:05:30.0279925Z ok (2.833s) 2022-05-18T05:05:30.0280146Z 2022-05-18T05:05:30.0280557Z ---------------------------------------------------------------------- 2022-05-18T05:05:30.0280882Z Ran 8 tests in 26.793s 2022-05-18T05:05:30.0281051Z 2022-05-18T05:05:30.0283672Z OK 2022-05-18T05:05:30.0283872Z 2022-05-18T05:05:30.0284016Z Generating XML reports... 2022-05-18T05:05:30.0336678Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_comm/TEST-TestCommunication-20220518050503.xml 2022-05-18T05:05:30.3069040Z Running distributed/fsdp/test_fsdp_sharded_grad_scaler ... [2022-05-18 05:05:30.306394] 2022-05-18T05:05:30.3069814Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_sharded_grad_scaler.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:05:30.306499] 2022-05-18T05:05:31.2247312Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler 2022-05-18T05:05:31.2264971Z 2022-05-18T05:05:31.2265215Z Running tests... 2022-05-18T05:05:31.2265654Z ---------------------------------------------------------------------- 2022-05-18T05:05:32.7752076Z test_grad_scaling (__main__.TestShardGradScaler) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:32.7908848Z ok (1.564s) 2022-05-18T05:05:32.7933359Z test_inf_gradients_skip_optim_step (__main__.TestShardGradScaler) ... ok (0.002s) 2022-05-18T05:05:32.7997395Z test_scaling_unscaling_sparse (__main__.TestShardGradScaler) ... ok (0.006s) 2022-05-18T05:05:32.8262478Z test_scaler_enabled_offload_false_none_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103442 2022-05-18T05:05:32.8375111Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103443 2022-05-18T05:05:33.7647753Z dist init r=0, world=2 2022-05-18T05:05:33.7651466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:33.7954480Z dist init r=1, world=2 2022-05-18T05:05:33.7958814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:33.7959644Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:33.8061151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:35.1464027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:35.1464769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:35.3492652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:35.3493212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:35.3533244Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:35.3533835Z warnings.warn( 2022-05-18T05:05:35.3534606Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:35.3535143Z warnings.warn( 2022-05-18T05:05:35.4106528Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:35.4107211Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:35.4110685Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:35.4111351Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:35.4170472Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:35.4171286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:35.7456188Z ok (2.946s) 2022-05-18T05:05:35.7599659Z test_scaler_enabled_offload_false_none_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103525 2022-05-18T05:05:35.7710495Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103526 2022-05-18T05:05:36.6960788Z dist init r=0, world=2 2022-05-18T05:05:36.6963908Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:36.7229946Z dist init r=1, world=2 2022-05-18T05:05:36.7234426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:36.7235548Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:36.7270394Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:38.0865696Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:38.0866252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:38.2946482Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:38.2947377Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:38.2984746Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:38.2985322Z warnings.warn( 2022-05-18T05:05:38.2986068Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:38.2986611Z warnings.warn( 2022-05-18T05:05:38.3490957Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:38.3491646Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:38.3492589Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:38.3493235Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:38.3589882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:38.3590390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:38.6789713Z ok (2.933s) 2022-05-18T05:05:38.6932844Z test_scaler_enabled_offload_false_shard_grad_op_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103608 2022-05-18T05:05:38.7043867Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103609 2022-05-18T05:05:39.6522875Z dist init r=0, world=2 2022-05-18T05:05:39.6525977Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:39.6733949Z dist init r=1, world=2 2022-05-18T05:05:39.6738351Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:39.6739367Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:39.6832803Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:41.0463234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:41.0464136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:41.2538041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:41.2538600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:41.2577932Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:41.2578517Z warnings.warn( 2022-05-18T05:05:41.2579300Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:41.2579837Z warnings.warn( 2022-05-18T05:05:41.3138462Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:41.3139468Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:41.3140411Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:41.3141080Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:41.3200155Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:41.3200641Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:41.7124268Z ok (3.033s) 2022-05-18T05:05:41.7269833Z test_scaler_enabled_offload_false_shard_grad_op_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103691 2022-05-18T05:05:41.7379273Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103692 2022-05-18T05:05:42.6565925Z dist init r=0, world=2 2022-05-18T05:05:42.6568861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:42.6721719Z dist init r=1, world=2 2022-05-18T05:05:42.6725813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:42.6726634Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:42.6773858Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:44.0082592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:44.0083160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:44.2097825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:44.2098365Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:44.2135710Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:44.2136413Z warnings.warn( 2022-05-18T05:05:44.2137192Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:44.2137739Z warnings.warn( 2022-05-18T05:05:44.2612637Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:44.2613331Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:44.2614254Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:44.2614908Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:44.2708055Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:44.2708556Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:44.6467463Z ok (2.934s) 2022-05-18T05:05:44.6609750Z test_scaler_enabled_offload_true_none_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103774 2022-05-18T05:05:44.6719552Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103775 2022-05-18T05:05:45.5917097Z dist init r=1, world=2 2022-05-18T05:05:45.5920288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:45.6259833Z dist init r=0, world=2 2022-05-18T05:05:45.6264419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:45.6265445Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:45.6328816Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:46.9683292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:46.9683812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:47.1695399Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:47.1695962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:47.1734040Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:47.1734619Z warnings.warn( 2022-05-18T05:05:47.1735371Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:47.1735927Z warnings.warn( 2022-05-18T05:05:47.1851402Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:47.1851906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:47.1894540Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1895872Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1897148Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1898412Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1899681Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1901057Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1902322Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.1903803Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:47.2597788Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:47.2598472Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:47.2600046Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:47.2600717Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:47.2655796Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:47.2656299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:47.6799628Z ok (3.033s) 2022-05-18T05:05:47.6941641Z test_scaler_enabled_offload_true_none_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103857 2022-05-18T05:05:47.7050843Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103858 2022-05-18T05:05:48.6255921Z dist init r=1, world=2 2022-05-18T05:05:48.6259040Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:48.6364198Z dist init r=0, world=2 2022-05-18T05:05:48.6368122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:48.6369202Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:48.6464359Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:49.9783399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:49.9784175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:50.1852003Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:50.1852544Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:50.1889599Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:50.1890201Z warnings.warn( 2022-05-18T05:05:50.1890966Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:50.1891822Z warnings.warn( 2022-05-18T05:05:50.2007573Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:50.2008098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:50.2059584Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:50.2060906Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:50.2778989Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:50.2779694Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:50.2780648Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:50.2781314Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:50.2871344Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:50.2871841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:50.6128329Z ok (2.933s) 2022-05-18T05:05:50.6274191Z test_scaler_enabled_offload_true_shard_grad_op_mixed_precision (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103940 2022-05-18T05:05:50.6382408Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103941 2022-05-18T05:05:51.5846799Z dist init r=1, world=2 2022-05-18T05:05:51.5850316Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:51.6096268Z dist init r=0, world=2 2022-05-18T05:05:51.6100488Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:51.6101297Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:51.6157108Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:52.9510786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:52.9511369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:53.1566169Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:53.1566723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:53.1606583Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:53.1607172Z warnings.warn( 2022-05-18T05:05:53.1607942Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:53.1608831Z warnings.warn( 2022-05-18T05:05:53.1729486Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:53.1730352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:53.1773921Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1775254Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1776540Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1777854Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1779121Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1780387Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1781869Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.1783143Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:53.2477902Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:53.2478611Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:53.2482794Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:53.2483466Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:53.2541305Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:53.2542000Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:53.6462538Z ok (3.033s) 2022-05-18T05:05:53.6605200Z test_scaler_enabled_offload_true_shard_grad_op_none (__main__.TestShardedGradScalerParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104023 2022-05-18T05:05:53.6714913Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104024 2022-05-18T05:05:54.5860479Z dist init r=1, world=2 2022-05-18T05:05:54.5864040Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:54.6102190Z dist init r=0, world=2 2022-05-18T05:05:54.6106569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:54.6107995Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:54.6170936Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:55.9532327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:55.9532920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:56.1595056Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:56.1595758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:56.1632548Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:56.1633375Z warnings.warn( 2022-05-18T05:05:56.1634155Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:05:56.1634707Z warnings.warn( 2022-05-18T05:05:56.1755728Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:56.1756240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:56.1809768Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:56.1811737Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:05:56.2543111Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:56.2544296Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:56.2547029Z /opt/conda/lib/python3.9/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T05:05:56.2547975Z warnings.warn(msg, FutureWarning) 2022-05-18T05:05:56.2643558Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:56.2644549Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:05:56.6795431Z ok (3.033s) 2022-05-18T05:05:56.6795644Z 2022-05-18T05:05:56.6796305Z ---------------------------------------------------------------------- 2022-05-18T05:05:56.6796644Z Ran 11 tests in 25.453s 2022-05-18T05:05:56.6796814Z 2022-05-18T05:05:56.6796909Z OK 2022-05-18T05:05:56.6797043Z 2022-05-18T05:05:56.6797178Z Generating XML reports... 2022-05-18T05:05:56.6856081Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardGradScaler-20220518050531.xml 2022-05-18T05:05:56.6866048Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardedGradScalerParityWithDDP-20220518050531.xml 2022-05-18T05:05:56.9567115Z Running distributed/algorithms/test_join ... [2022-05-18 05:05:56.956203] 2022-05-18T05:05:56.9567900Z Executing ['/opt/conda/bin/python', 'distributed/algorithms/test_join.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:05:56.956305] 2022-05-18T05:05:57.8346042Z Test results will be stored in test-reports/python-unittest/distributed.algorithms.test_join 2022-05-18T05:05:57.8363383Z 2022-05-18T05:05:57.8363531Z Running tests... 2022-05-18T05:05:57.8363972Z ---------------------------------------------------------------------- 2022-05-18T05:05:57.8373449Z test_join_kwargs (__main__.TestJoin) 2022-05-18T05:05:59.3965225Z Tests passing keyword arguments to the context manager. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:59.4379716Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104141 2022-05-18T05:05:59.4495133Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104142 2022-05-18T05:06:00.3310504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:00.3312690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:00.3482440Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:00.3486030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:00.3487110Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:00.3517758Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:01.9570134Z ok (4.120s) 2022-05-18T05:06:01.9578748Z test_multiple_joinable_disable (__main__.TestJoin) 2022-05-18T05:06:01.9712941Z Tests ``enable=False`` for multiple :class:`Joinable` s. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104220 2022-05-18T05:06:01.9823030Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104221 2022-05-18T05:06:02.8703032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:02.8705564Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:02.9029139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:02.9032414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:02.9033556Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:02.9114005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:04.4893340Z ok (2.532s) 2022-05-18T05:06:04.4903162Z test_multiple_joinables (__main__.TestJoin) 2022-05-18T05:06:04.5032122Z Tests the main hooks and post-hooks of multiple :class:`Joinable` s ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104299 2022-05-18T05:06:04.5143483Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104300 2022-05-18T05:06:05.3773291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:05.3775747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:05.3891008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:05.3894432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:05.3895924Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:05.3980998Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:07.0212364Z ok (2.532s) 2022-05-18T05:06:07.0220156Z test_multiple_joinables_throw (__main__.TestJoin) 2022-05-18T05:06:07.0353399Z Tests ``throw_on_early_termination=True`` for multiple ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104378 2022-05-18T05:06:07.0462760Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104379 2022-05-18T05:06:07.9329391Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:07.9330968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:07.9337704Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:07.9341274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:07.9342352Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:07.9434409Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:09.5531522Z ok (2.532s) 2022-05-18T05:06:09.5541327Z test_single_joinable (__main__.TestJoin) 2022-05-18T05:06:09.5675247Z Tests the main hooks and post-hooks of a single :class:`Joinable` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104457 2022-05-18T05:06:09.5784237Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104458 2022-05-18T05:06:10.4564722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:10.4565854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:10.4635386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:10.4639166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:10.4640641Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:10.4669959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:12.0854321Z ok (2.532s) 2022-05-18T05:06:12.0862500Z test_single_joinable_disable (__main__.TestJoin) 2022-05-18T05:06:12.0995942Z Tests ``enable=False`` for a single :class:`Joinable`. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104536 2022-05-18T05:06:12.1106039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104537 2022-05-18T05:06:13.0333793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:13.0335615Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:13.0463387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:13.0467777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:13.0468591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:13.0540709Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:14.7179451Z ok (2.632s) 2022-05-18T05:06:14.7190413Z test_single_joinable_main_hooks (__main__.TestJoin) 2022-05-18T05:06:14.7326998Z Tests the main hooks of a single :class:`Joinable`. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104615 2022-05-18T05:06:14.7439389Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104616 2022-05-18T05:06:15.6190585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:15.6192694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:15.6483473Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:15.6486478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:15.6487686Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:15.6499122Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:17.2508353Z ok (2.533s) 2022-05-18T05:06:17.2515531Z test_single_joinable_post_hooks (__main__.TestJoin) 2022-05-18T05:06:17.2649997Z Tests the post-hooks of a single :class:`Joinable`. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104694 2022-05-18T05:06:17.2761499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104695 2022-05-18T05:06:18.1608865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:18.1611144Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:18.1726094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:18.1728987Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:18.1730369Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:18.1816027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:19.7831014Z ok (2.532s) 2022-05-18T05:06:19.7838542Z test_single_joinable_throw (__main__.TestJoin) 2022-05-18T05:06:19.7968867Z Tests ``throw_on_early_termination=True`` for a single ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104773 2022-05-18T05:06:19.8250436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104774 2022-05-18T05:06:20.7557421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:20.7558978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:20.7635763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:20.7639436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:20.7640853Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:20.7662727Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:22.4321984Z ok (2.649s) 2022-05-18T05:06:22.4322208Z 2022-05-18T05:06:22.4322605Z ---------------------------------------------------------------------- 2022-05-18T05:06:22.4325481Z Ran 9 tests in 24.596s 2022-05-18T05:06:22.4325670Z 2022-05-18T05:06:22.4325766Z OK 2022-05-18T05:06:22.4325911Z 2022-05-18T05:06:22.4328041Z Generating XML reports... 2022-05-18T05:06:22.4377048Z Generated XML report: test-reports/python-unittest/distributed.algorithms.test_join/TEST-TestJoin-20220518050557.xml 2022-05-18T05:06:22.7033360Z Running distributed/fsdp/test_fsdp_misc ... [2022-05-18 05:06:22.702839] 2022-05-18T05:06:22.7034117Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:06:22.702945] 2022-05-18T05:06:23.6358962Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_misc 2022-05-18T05:06:23.6375709Z 2022-05-18T05:06:23.6376141Z Running tests... 2022-05-18T05:06:23.6376683Z ---------------------------------------------------------------------- 2022-05-18T05:06:23.6385769Z test_device_id_auto_wrap (__main__.TestFSDPMisc) 2022-05-18T05:06:25.2130187Z Test auto wrapping propagates the device id. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:25.2530122Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104887 2022-05-18T05:06:25.2644077Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104888 2022-05-18T05:06:26.2133641Z dist init r=0, world=2 2022-05-18T05:06:26.2136464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:26.2698915Z dist init r=1, world=2 2022-05-18T05:06:26.2703544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:26.2704593Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:26.2747374Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:27.6268062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:27.6268589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:27.9720563Z ok (4.334s) 2022-05-18T05:06:27.9733164Z test_fsdp_cpu_init_stays_on_cpu (__main__.TestFSDPMisc) 2022-05-18T05:06:27.9865186Z Ensure that CPU model input stays on CPU ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104966 2022-05-18T05:06:27.9972679Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104967 2022-05-18T05:06:28.9044652Z dist init r=0, world=2 2022-05-18T05:06:28.9047810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:28.9326460Z dist init r=1, world=2 2022-05-18T05:06:28.9331166Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:28.9331979Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:28.9354317Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:30.2861174Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:30.2861722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:30.8056313Z ok (2.833s) 2022-05-18T05:06:30.8076807Z test_fsdp_device_id_use_index_False (__main__.TestFSDPMisc) 2022-05-18T05:06:30.8211265Z If CPU module is passed into FSDP with device_id ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105049 2022-05-18T05:06:30.8318333Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105050 2022-05-18T05:06:31.7535661Z dist init r=0, world=2 2022-05-18T05:06:31.7539046Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:31.7846844Z dist init r=1, world=2 2022-05-18T05:06:31.7851389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:31.7852204Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:31.7947560Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:33.1138919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:33.4388438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:33.4388865Z ok (2.633s) 2022-05-18T05:06:33.4408211Z test_fsdp_device_id_use_index_True (__main__.TestFSDPMisc) 2022-05-18T05:06:33.4540352Z If CPU module is passed into FSDP with device_id ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105128 2022-05-18T05:06:33.4649116Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105129 2022-05-18T05:06:34.3775461Z dist init r=0, world=2 2022-05-18T05:06:34.3778461Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:34.3842992Z dist init r=1, world=2 2022-05-18T05:06:34.3847392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:34.3848946Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:34.3881562Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:35.7266534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:35.7267066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:36.0719228Z ok (2.633s) 2022-05-18T05:06:36.0730640Z test_fsdp_same_model_across_ranks (__main__.TestFSDPMisc) 2022-05-18T05:06:36.0863621Z FSDP broadcasts model from rank 0 to ensure it starts off with the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105207 2022-05-18T05:06:36.0972775Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105208 2022-05-18T05:06:37.0600746Z dist init r=1, world=2 2022-05-18T05:06:37.0603544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:37.0611439Z dist init r=0, world=2 2022-05-18T05:06:37.0615943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:37.0616896Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:37.0706825Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:38.4262154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:38.4262677Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:38.7043582Z ok (2.632s) 2022-05-18T05:06:38.7051399Z test_module_device_mismatches_device_id (__main__.TestFSDPMisc) 2022-05-18T05:06:38.7183416Z FSDP raises errors when module is on a GPU that does ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105286 2022-05-18T05:06:38.7291416Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105287 2022-05-18T05:06:39.6706619Z dist init r=0, world=2 2022-05-18T05:06:39.6710086Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:39.6901519Z dist init r=1, world=2 2022-05-18T05:06:39.6905883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:39.6906743Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:39.6914809Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:41.0524245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:41.0524788Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:41.3360907Z ok (2.632s) 2022-05-18T05:06:41.3369227Z test_multi_device_not_supported (__main__.TestFSDPMisc) 2022-05-18T05:06:41.3504420Z FSDP throws appropriate error when we wrap multi-device module. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105365 2022-05-18T05:06:41.3613453Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105366 2022-05-18T05:06:42.3106540Z dist init r=0, world=2 2022-05-18T05:06:42.3109697Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:42.3242378Z dist init r=1, world=2 2022-05-18T05:06:42.3246988Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:42.3247855Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:42.3314707Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:43.6726463Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:43.6727012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:43.9683566Z ok (2.632s) 2022-05-18T05:06:43.9693832Z test_no_params (__main__.TestFSDPMisc) 2022-05-18T05:06:43.9825286Z Test that device_id and cpu init work if module has no params ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105444 2022-05-18T05:06:43.9935341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105445 2022-05-18T05:06:44.9066109Z dist init r=0, world=2 2022-05-18T05:06:44.9071986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:44.9513184Z dist init r=1, world=2 2022-05-18T05:06:44.9517421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:44.9518496Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:44.9579341Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:46.3144023Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:46.3144745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:46.6003723Z ok (2.632s) 2022-05-18T05:06:46.6003933Z 2022-05-18T05:06:46.6004324Z ---------------------------------------------------------------------- 2022-05-18T05:06:46.6004691Z Ran 8 tests in 22.963s 2022-05-18T05:06:46.6008148Z 2022-05-18T05:06:46.6008556Z OK 2022-05-18T05:06:46.6008731Z 2022-05-18T05:06:46.6008878Z Generating XML reports... 2022-05-18T05:06:46.6069318Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20220518050623.xml 2022-05-18T05:06:46.8751093Z Running distributed/_shard/checkpoint/test_checkpoint ... [2022-05-18 05:06:46.874588] 2022-05-18T05:06:46.8752150Z Executing ['/opt/conda/bin/python', 'distributed/_shard/checkpoint/test_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:06:46.874688] 2022-05-18T05:06:47.8066183Z Test results will be stored in test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint 2022-05-18T05:06:47.8086036Z 2022-05-18T05:06:47.8086277Z Running tests... 2022-05-18T05:06:47.8086721Z ---------------------------------------------------------------------- 2022-05-18T05:06:49.3956868Z test_checkpoint_has_shard_overlap (__main__.TestDistributedCheckpointing) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:49.4365176Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105558 2022-05-18T05:06:49.4478739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105559 2022-05-18T05:06:50.3571571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:50.3576198Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:50.3593537Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:50.3599210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:50.3600411Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:50.3679362Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:52.0553603Z ok (4.246s) 2022-05-18T05:06:52.0696923Z test_checkpoint_has_shard_too_small (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105637 2022-05-18T05:06:52.0805331Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105638 2022-05-18T05:06:52.9979714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:52.9983407Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:53.0043074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:53.0048523Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:53.0049888Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:53.0086574Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:54.6875020Z ok (2.632s) 2022-05-18T05:06:54.7017923Z test_checkpoint_has_storage_type_mismatch (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105716 2022-05-18T05:06:54.7126389Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105717 2022-05-18T05:06:55.6066421Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:55.6071272Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:55.6216098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:55.6221729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:55.6222890Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:55.6276761Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:57.3195435Z ok (2.632s) 2022-05-18T05:06:57.3348012Z test_storage_key_mapping (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105795 2022-05-18T05:06:57.3455363Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105796 2022-05-18T05:06:58.2347991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:58.2352297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:58.2459936Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:58.2465177Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:58.2466453Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:58.2557209Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:59.8522217Z ok (2.532s) 2022-05-18T05:06:59.8665696Z test_tensor_metadata_with_missing_rank_spec (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105874 2022-05-18T05:06:59.8775864Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105875 2022-05-18T05:07:00.7968847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:00.7973092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:00.8231497Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:00.8237186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:00.8238005Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:00.8280038Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:02.4845069Z ok (2.632s) 2022-05-18T05:07:02.4998584Z test_validate_metadata (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105953 2022-05-18T05:07:02.5111736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105954 2022-05-18T05:07:03.4424967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:03.4429489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:03.4839011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:03.4844810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:03.4845950Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:03.4939482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:05.1182336Z ok (2.634s) 2022-05-18T05:07:05.1203158Z test_create_key_handles_collision (__main__.TestStorageKeys) ... ok (0.002s) 2022-05-18T05:07:05.1205280Z 2022-05-18T05:07:05.1206158Z ---------------------------------------------------------------------- 2022-05-18T05:07:05.1206640Z Ran 7 tests in 17.312s 2022-05-18T05:07:05.1206808Z 2022-05-18T05:07:05.1208302Z OK 2022-05-18T05:07:05.1208491Z 2022-05-18T05:07:05.1208641Z Generating XML reports... 2022-05-18T05:07:05.1254179Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20220518050647.xml 2022-05-18T05:07:05.1257238Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestStorageKeys-20220518050647.xml 2022-05-18T05:07:05.3996704Z Running distributed/_shard/sharded_tensor/ops/test_matrix_ops ... [2022-05-18 05:07:05.399174] 2022-05-18T05:07:05.3997472Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_matrix_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:07:05.399276] 2022-05-18T05:07:06.2885905Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops 2022-05-18T05:07:06.2902212Z 2022-05-18T05:07:06.2902467Z Running tests... 2022-05-18T05:07:06.2902887Z ---------------------------------------------------------------------- 2022-05-18T05:07:07.8418146Z test_sharded_tensor_contiguous (__main__.TestShardedTensorMatrixOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:07.8821946Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106067 2022-05-18T05:07:07.8936259Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106068 2022-05-18T05:07:07.9055549Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106069 2022-05-18T05:07:07.9178696Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106070 2022-05-18T05:07:08.8146192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:08.8235506Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:08.8863214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:08.8971847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:09.1227579Z skip: Need at least 4 CUDA devices (2.832s) 2022-05-18T05:07:09.1378060Z test_sharded_tensor_layer_norm (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106203 2022-05-18T05:07:09.1487071Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106204 2022-05-18T05:07:09.1602774Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106205 2022-05-18T05:07:09.1715744Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106206 2022-05-18T05:07:10.1446613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:10.1574617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:10.1702027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:10.2470679Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:10.3758756Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:07:10.3908505Z test_sharded_tensor_layer_norm_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106339 2022-05-18T05:07:10.4019044Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106340 2022-05-18T05:07:10.4135956Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106341 2022-05-18T05:07:10.4254548Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106342 2022-05-18T05:07:11.3152626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:11.3288067Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:11.3511942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:11.3769562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:11.5297481Z skip: Need at least 4 CUDA devices (1.154s) 2022-05-18T05:07:11.5435213Z test_sharded_tensor_masked_fill (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106475 2022-05-18T05:07:11.5545581Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106476 2022-05-18T05:07:11.5660946Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106477 2022-05-18T05:07:11.5775270Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106478 2022-05-18T05:07:12.5232074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:12.5457309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:12.5887923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:12.6262350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:12.7817222Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:07:12.7963851Z test_sharded_tensor_masked_fill_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106611 2022-05-18T05:07:12.8073494Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106612 2022-05-18T05:07:12.8186532Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106613 2022-05-18T05:07:12.8302768Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106614 2022-05-18T05:07:13.7310876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:13.7343588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:13.7992826Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:13.8101223Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:14.0344134Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:07:14.0489473Z test_sharded_tensor_softmax (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106747 2022-05-18T05:07:14.0599616Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106748 2022-05-18T05:07:14.0715471Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106749 2022-05-18T05:07:14.0831942Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106750 2022-05-18T05:07:14.9767324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:14.9868586Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:15.0295682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:15.0422275Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:15.1871860Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:07:15.2028517Z test_sharded_tensor_transpose (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106883 2022-05-18T05:07:15.2137700Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106884 2022-05-18T05:07:15.2249652Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 106885 2022-05-18T05:07:15.2364841Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 106886 2022-05-18T05:07:16.1252562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:16.1748197Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:16.1846945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:16.1931466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:16.3405839Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:07:16.3546753Z test_sharded_tensor_transpose_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107019 2022-05-18T05:07:16.3655008Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107020 2022-05-18T05:07:16.3769482Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 107021 2022-05-18T05:07:16.3882906Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 107022 2022-05-18T05:07:17.3266854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:17.3308658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:17.3491317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:17.3547674Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:17.5926384Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:07:17.6072833Z test_sharded_tensor_type_as (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107155 2022-05-18T05:07:17.6180625Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107156 2022-05-18T05:07:17.6292902Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 107157 2022-05-18T05:07:17.6408729Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 107158 2022-05-18T05:07:18.5450584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:18.6164169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:18.6170443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:18.6334606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:18.8452207Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:07:18.8601235Z test_sharded_tensor_view (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107291 2022-05-18T05:07:18.8713094Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107292 2022-05-18T05:07:18.8829080Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 107293 2022-05-18T05:07:18.8944487Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 107294 2022-05-18T05:07:19.8084926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:19.8085453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:19.8198568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:19.8768599Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:20.0985988Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:07:20.1131255Z test_sharded_tensor_view_error (__main__.TestShardedTensorMatrixOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107427 2022-05-18T05:07:20.1240359Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107428 2022-05-18T05:07:20.1352905Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 107429 2022-05-18T05:07:20.1467686Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 107430 2022-05-18T05:07:21.1209702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:07:21.1227104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:21.1393260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:07:21.1398476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:21.3510392Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:07:21.3510826Z 2022-05-18T05:07:21.3511508Z ---------------------------------------------------------------------- 2022-05-18T05:07:21.3512107Z Ran 11 tests in 15.061s 2022-05-18T05:07:21.3512398Z 2022-05-18T05:07:21.3512600Z OK (skipped=11) 2022-05-18T05:07:21.3514055Z 2022-05-18T05:07:21.3514543Z Generating XML reports... 2022-05-18T05:07:21.3570098Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20220518050706.xml 2022-05-18T05:07:21.6308737Z Running distributed/fsdp/test_fsdp_memory ... [2022-05-18 05:07:21.630323] 2022-05-18T05:07:21.6309481Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_memory.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:07:21.630460] 2022-05-18T05:07:22.5103061Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_memory 2022-05-18T05:07:22.5119405Z 2022-05-18T05:07:22.5119841Z Running tests... 2022-05-18T05:07:22.5120319Z ---------------------------------------------------------------------- 2022-05-18T05:07:24.1077189Z test_fsdp_memory_ckpt_ckpt (__main__.TestFSDPMemory) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:24.1476162Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107598 2022-05-18T05:07:24.1589560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107599 2022-05-18T05:07:25.0553722Z dist init r=0, world=2 2022-05-18T05:07:25.0556851Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:25.0989175Z dist init r=1, world=2 2022-05-18T05:07:25.0994374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:25.0995497Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:25.1066804Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:26.4588451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:26.4588956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:26.4953356Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:26.4954038Z warnings.warn( 2022-05-18T05:07:26.4961091Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:26.4961649Z warnings.warn( 2022-05-18T05:07:29.6724247Z ok (7.160s) 2022-05-18T05:07:29.6876221Z test_fsdp_memory_ckpt_no_ckpt (__main__.TestFSDPMemory) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107681 2022-05-18T05:07:29.6987343Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107682 2022-05-18T05:07:30.6099250Z dist init r=1, world=2 2022-05-18T05:07:30.6102232Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:30.6336900Z dist init r=0, world=2 2022-05-18T05:07:30.6341499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:30.6342314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:30.6408610Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:31.9910891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:31.9911476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:32.0268947Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:32.0269536Z warnings.warn( 2022-05-18T05:07:32.0285964Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:32.0286506Z warnings.warn( 2022-05-18T05:07:34.7108234Z ok (5.038s) 2022-05-18T05:07:34.7108477Z 2022-05-18T05:07:34.7108895Z ---------------------------------------------------------------------- 2022-05-18T05:07:34.7109226Z Ran 2 tests in 12.199s 2022-05-18T05:07:34.7109398Z 2022-05-18T05:07:34.7109496Z OK 2022-05-18T05:07:34.7109635Z 2022-05-18T05:07:34.7109773Z Generating XML reports... 2022-05-18T05:07:34.7166467Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_memory/TEST-TestFSDPMemory-20220518050722.xml 2022-05-18T05:07:34.9950935Z Running distributed/_shard/checkpoint/test_file_system_checkpoint ... [2022-05-18 05:07:34.994583] 2022-05-18T05:07:34.9951741Z Executing ['/opt/conda/bin/python', 'distributed/_shard/checkpoint/test_file_system_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:07:34.994682] 2022-05-18T05:07:35.9255170Z Test results will be stored in test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint 2022-05-18T05:07:35.9275357Z 2022-05-18T05:07:35.9275517Z Running tests... 2022-05-18T05:07:35.9275975Z ---------------------------------------------------------------------- 2022-05-18T05:07:37.5077718Z test_load_rowwise_to_colwise (__main__.TestDistributedReshardOnLoad) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:37.5486840Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107799 2022-05-18T05:07:37.5601427Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107800 2022-05-18T05:07:38.4758025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:38.4761167Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:38.4776010Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:38.4780849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:38.4781873Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:38.4865089Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:40.0675889Z ok (4.140s) 2022-05-18T05:07:40.0842537Z test_load_with_different_shard_plan (__main__.TestDistributedReshardOnLoad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107878 2022-05-18T05:07:40.0955667Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107879 2022-05-18T05:07:41.0361210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:41.0365091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:41.0528000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:41.0532522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:41.0533594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:41.0569657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:42.8027100Z ok (2.735s) 2022-05-18T05:07:42.8171861Z test_save_load_bytes (__main__.TestDistributedReshardOnLoad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107957 2022-05-18T05:07:42.8281163Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107958 2022-05-18T05:07:43.7418378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:43.7421763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:43.7936555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:43.7941467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:43.7942291Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:43.8033461Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:45.4351238Z ok (2.632s) 2022-05-18T05:07:45.4619026Z test_read_write_only_tensor (__main__.TestDistributedStateDictSaveLoad) ... ok (0.027s) 2022-05-18T05:07:45.4759570Z test_read_write_shard_tensor (__main__.TestDistributedStateDictSaveLoadWithSharedTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108036 2022-05-18T05:07:45.4869080Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108037 2022-05-18T05:07:46.4641919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:46.4645079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:46.4679526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:46.4684309Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:46.4685642Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:46.4748030Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:48.0938906Z ok (2.632s) 2022-05-18T05:07:48.0939270Z 2022-05-18T05:07:48.0940051Z ---------------------------------------------------------------------- 2022-05-18T05:07:48.0940580Z Ran 5 tests in 12.166s 2022-05-18T05:07:48.0940752Z 2022-05-18T05:07:48.0940850Z OK 2022-05-18T05:07:48.0941103Z 2022-05-18T05:07:48.0941609Z Generating XML reports... 2022-05-18T05:07:48.0987945Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedReshardOnLoad-20220518050735.xml 2022-05-18T05:07:48.0989451Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoad-20220518050735.xml 2022-05-18T05:07:48.0992782Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoadWithSharedTensor-20220518050735.xml 2022-05-18T05:07:48.3674702Z Running distributed/elastic/timer/local_timer_example ... [2022-05-18 05:07:48.366983] 2022-05-18T05:07:48.3675488Z Executing ['/opt/conda/bin/python', 'distributed/elastic/timer/local_timer_example.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:07:48.367083] 2022-05-18T05:07:49.2487488Z Test results will be stored in test-reports/python-unittest/distributed.elastic.timer.local_timer_example 2022-05-18T05:07:49.2502594Z 2022-05-18T05:07:49.2502832Z Running tests... 2022-05-18T05:07:49.2503262Z ---------------------------------------------------------------------- 2022-05-18T05:07:50.8275992Z test_example_start_method_spawn (__main__.LocalTimerExample) ... [INFO] 2022-05-18 05:07:50,827 driver: init 2022-05-18T05:07:50.8589317Z [INFO] 2022-05-18 05:07:50,858 api: Starting LocalTimerServer... max_interval=0.01, daemon=True 2022-05-18T05:07:50.8590251Z [INFO] 2022-05-18 05:07:50,858 api: Starting watchdog thread... 2022-05-18T05:07:51.9959001Z [INFO] 2022-05-18 05:07:51,995 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:51.9970458Z [INFO] 2022-05-18 05:07:51,996 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:52.0302894Z [INFO] 2022-05-18 05:07:52,029 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:52.0491724Z [INFO] 2022-05-18 05:07:52,048 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:52.0522738Z [INFO] 2022-05-18 05:07:52,051 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:52.0787421Z [INFO] 2022-05-18 05:07:52,078 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:52.0795720Z [INFO] 2022-05-18 05:07:52,079 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:52.1045838Z [INFO] 2022-05-18 05:07:52,104 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:53.0512506Z [INFO] 2022-05-18 05:07:53,050 api: Reaping worker_id=[108153]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:53.0514058Z [INFO] 2022-05-18 05:07:53,051 api: Successfully reaped worker=[108153] 2022-05-18T05:07:53.0515143Z [INFO] 2022-05-18 05:07:53,051 api: Reaping worker_id=[108151]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:53.0518838Z [INFO] 2022-05-18 05:07:53,051 api: Successfully reaped worker=[108151] 2022-05-18T05:07:53.0826787Z [INFO] 2022-05-18 05:07:53,082 api: Reaping worker_id=[108155]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:53.0828409Z [INFO] 2022-05-18 05:07:53,082 api: Successfully reaped worker=[108155] 2022-05-18T05:07:53.1644155Z [INFO] 2022-05-18 05:07:53,163 api: Reaping worker_id=[108157]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:53.1647411Z [INFO] 2022-05-18 05:07:53,164 api: Successfully reaped worker=[108157] 2022-05-18T05:07:53.1711179Z [INFO] 2022-05-18 05:07:53,170 api: Stopping LocalTimerServer 2022-05-18T05:07:53.1711660Z [INFO] 2022-05-18 05:07:53,170 api: Stopping watchdog thread... 2022-05-18T05:07:53.1754617Z ok (3.925s) 2022-05-18T05:07:53.1776277Z test_torch_mp_example (__main__.LocalTimerExample) ... [INFO] 2022-05-18 05:07:53,177 api: Starting LocalTimerServer... max_interval=0.01, daemon=True 2022-05-18T05:07:53.1776854Z [INFO] 2022-05-18 05:07:53,177 api: Starting watchdog thread... 2022-05-18T05:07:54.2996485Z [INFO] 2022-05-18 05:07:54,299 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3009353Z [INFO] 2022-05-18 05:07:54,300 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3043229Z [INFO] 2022-05-18 05:07:54,303 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3288739Z [INFO] 2022-05-18 05:07:54,328 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3301861Z [INFO] 2022-05-18 05:07:54,329 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3409669Z [INFO] 2022-05-18 05:07:54,340 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3526898Z [INFO] 2022-05-18 05:07:54,352 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:54.3586054Z [INFO] 2022-05-18 05:07:54,358 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.1900516Z [INFO] 2022-05-18 05:07:56,189 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.2157405Z [INFO] 2022-05-18 05:07:56,215 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.2319059Z [INFO] 2022-05-18 05:07:56,231 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.2588327Z [INFO] 2022-05-18 05:07:56,258 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.2600354Z [INFO] 2022-05-18 05:07:56,259 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.2863017Z [INFO] 2022-05-18 05:07:56,285 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.2927047Z [INFO] 2022-05-18 05:07:56,292 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:56.3234297Z [INFO] 2022-05-18 05:07:56,323 api: Timer client configured to: LocalTimerClient 2022-05-18T05:07:57.2492243Z [INFO] 2022-05-18 05:07:57,248 api: Reaping worker_id=[108698]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:57.2495660Z [INFO] 2022-05-18 05:07:57,249 api: Successfully reaped worker=[108698] 2022-05-18T05:07:57.2702025Z [INFO] 2022-05-18 05:07:57,269 api: Reaping worker_id=[108700]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:57.2703292Z [INFO] 2022-05-18 05:07:57,269 api: Successfully reaped worker=[108700] 2022-05-18T05:07:57.2908976Z [INFO] 2022-05-18 05:07:57,290 api: Reaping worker_id=[108699]. Expired timers: ['/opt/conda/lib/python3.9/contextlib.py#119'] 2022-05-18T05:07:57.2909855Z [INFO] 2022-05-18 05:07:57,290 local_timer: Process with pid=108699 does not exist. Skipping 2022-05-18T05:07:57.2910367Z [INFO] 2022-05-18 05:07:57,290 api: Successfully reaped worker=[108699] 2022-05-18T05:07:57.2995558Z [INFO] 2022-05-18 05:07:57,299 api: Stopping LocalTimerServer 2022-05-18T05:07:57.2996393Z [INFO] 2022-05-18 05:07:57,299 api: Stopping watchdog thread... 2022-05-18T05:07:57.3016166Z ok (4.126s) 2022-05-18T05:07:57.3018994Z 2022-05-18T05:07:57.3019530Z ---------------------------------------------------------------------- 2022-05-18T05:07:57.3020313Z Ran 2 tests in 8.052s 2022-05-18T05:07:57.3020616Z 2022-05-18T05:07:57.3020719Z OK 2022-05-18T05:07:57.3020839Z 2022-05-18T05:07:57.3020968Z Generating XML reports... 2022-05-18T05:07:57.3063227Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_example/TEST-LocalTimerExample-20220518050749.xml 2022-05-18T05:07:57.5696503Z Running distributed/_shard/test_partial_tensor ... [2022-05-18 05:07:57.569105] 2022-05-18T05:07:57.5697579Z Executing ['/opt/conda/bin/python', 'distributed/_shard/test_partial_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:07:57.569206] 2022-05-18T05:07:58.4750758Z Test results will be stored in test-reports/python-unittest/distributed._shard.test_partial_tensor 2022-05-18T05:07:58.4767431Z 2022-05-18T05:07:58.4767723Z Running tests... 2022-05-18T05:07:58.4768168Z ---------------------------------------------------------------------- 2022-05-18T05:08:00.0248413Z test_cat (__main__.TestPartialTensorOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:00.0651576Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109003 2022-05-18T05:08:00.0765268Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109004 2022-05-18T05:08:00.0881651Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109005 2022-05-18T05:08:00.1001775Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109006 2022-05-18T05:08:01.0414086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:01.0931414Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:01.1115430Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:01.1474459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:01.3049236Z skip: Need at least 4 CUDA devices (2.828s) 2022-05-18T05:08:01.3198013Z test_cat_errors (__main__.TestPartialTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109139 2022-05-18T05:08:01.3313031Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109140 2022-05-18T05:08:01.3438098Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109141 2022-05-18T05:08:01.3557934Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109142 2022-05-18T05:08:02.2777146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:02.2939423Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:02.3446512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:02.3668604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:02.5600043Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:08:02.5742348Z test_transpose (__main__.TestPartialTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109275 2022-05-18T05:08:02.5851493Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109276 2022-05-18T05:08:02.5966561Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109277 2022-05-18T05:08:02.6082826Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109278 2022-05-18T05:08:03.5390538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:03.5399039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:03.5529514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:03.5775656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:03.8124550Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:08:03.8264358Z test_partial_tensor_reshard (__main__.TestPartialTensorReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109411 2022-05-18T05:08:03.8373907Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109412 2022-05-18T05:08:03.8491096Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109413 2022-05-18T05:08:03.8604951Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109414 2022-05-18T05:08:04.8418301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:04.8603236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:04.8611963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:04.8826497Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:05.0646884Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:08:05.0796689Z test_partial_tensor_reshard_errors (__main__.TestPartialTensorReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109547 2022-05-18T05:08:05.0905352Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109548 2022-05-18T05:08:05.1018100Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109549 2022-05-18T05:08:05.1134810Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109550 2022-05-18T05:08:06.0039647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:06.0182656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:06.0183150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:06.0185053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:06.2174734Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:08:06.2175015Z 2022-05-18T05:08:06.2175419Z ---------------------------------------------------------------------- 2022-05-18T05:08:06.2175743Z Ran 5 tests in 7.741s 2022-05-18T05:08:06.2175907Z 2022-05-18T05:08:06.2176020Z OK (skipped=5) 2022-05-18T05:08:06.2176180Z 2022-05-18T05:08:06.2176303Z Generating XML reports... 2022-05-18T05:08:06.2221773Z Generated XML report: test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorOps-20220518050758.xml 2022-05-18T05:08:06.2226573Z Generated XML report: test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorReshard-20220518050758.xml 2022-05-18T05:08:06.4898664Z Running distributed/fsdp/test_fsdp_input ... [2022-05-18 05:08:06.489354] 2022-05-18T05:08:06.4899470Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_input.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:06.489460] 2022-05-18T05:08:07.4025165Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_input 2022-05-18T05:08:07.4043417Z 2022-05-18T05:08:07.4043807Z Running tests... 2022-05-18T05:08:07.4044711Z ---------------------------------------------------------------------- 2022-05-18T05:08:07.4062286Z test_input_type_dict (__main__.TestInput) 2022-05-18T05:08:08.9850208Z Test FSDP with input being a list or a dict, only single GPU. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:09.0261846Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109718 2022-05-18T05:08:09.9517632Z dist init r=0, world=1 2022-05-18T05:08:09.9520630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:09.9521586Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:08:11.2183448Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:11.2405401Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:08:11.2405989Z warnings.warn( 2022-05-18T05:08:11.6333717Z ok (4.229s) 2022-05-18T05:08:11.6350903Z test_input_type_list (__main__.TestInput) 2022-05-18T05:08:11.6485551Z Test FSDP with input being a list or a dict, only single GPU. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109760 2022-05-18T05:08:12.5645220Z dist init r=0, world=1 2022-05-18T05:08:12.5648291Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:12.5649496Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:08:13.8364617Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:13.8604991Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:08:13.8605578Z warnings.warn( 2022-05-18T05:08:14.2551084Z ok (2.622s) 2022-05-18T05:08:14.2551309Z 2022-05-18T05:08:14.2551714Z ---------------------------------------------------------------------- 2022-05-18T05:08:14.2552062Z Ran 2 tests in 6.851s 2022-05-18T05:08:14.2552230Z 2022-05-18T05:08:14.2552326Z OK 2022-05-18T05:08:14.2552464Z 2022-05-18T05:08:14.2552606Z Generating XML reports... 2022-05-18T05:08:14.2609717Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_input/TEST-TestInput-20220518050807.xml 2022-05-18T05:08:14.5348699Z Running distributed/_shard/sharded_tensor/ops/test_tensor_ops ... [2022-05-18 05:08:14.534351] 2022-05-18T05:08:14.5349499Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_tensor_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:14.534477] 2022-05-18T05:08:15.4426764Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops 2022-05-18T05:08:15.4443501Z 2022-05-18T05:08:15.4443749Z Running tests... 2022-05-18T05:08:15.4444206Z ---------------------------------------------------------------------- 2022-05-18T05:08:16.9991377Z test_clone (__main__.TestTensorOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:17.0393004Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109837 2022-05-18T05:08:17.0506527Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109838 2022-05-18T05:08:17.0622777Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109839 2022-05-18T05:08:17.0743197Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109840 2022-05-18T05:08:17.9820980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:18.0072597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:18.0407622Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:18.0428882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:18.2792354Z skip: Need at least 4 CUDA devices (2.835s) 2022-05-18T05:08:18.2938196Z test_deep_copy (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109973 2022-05-18T05:08:18.3049537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109974 2022-05-18T05:08:18.3167375Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 109975 2022-05-18T05:08:18.3281829Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 109976 2022-05-18T05:08:19.2783636Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:19.3019175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:19.3060662Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:19.3596769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:19.5326803Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:08:19.5471700Z test_detach (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110109 2022-05-18T05:08:19.5581439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110110 2022-05-18T05:08:19.5696564Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 110111 2022-05-18T05:08:19.5812364Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 110112 2022-05-18T05:08:20.4719693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:20.5038932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:20.5119406Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:20.5324651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:20.6852858Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:08:20.6999046Z test_set_requires_grad (__main__.TestTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110245 2022-05-18T05:08:20.7109827Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110246 2022-05-18T05:08:20.7227646Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 110247 2022-05-18T05:08:20.7342197Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 110248 2022-05-18T05:08:21.6222588Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:21.6659735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:21.6665063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:21.6814777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:21.8382171Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:08:21.8382399Z 2022-05-18T05:08:21.8382803Z ---------------------------------------------------------------------- 2022-05-18T05:08:21.8383132Z Ran 4 tests in 6.394s 2022-05-18T05:08:21.8383311Z 2022-05-18T05:08:21.8383423Z OK (skipped=4) 2022-05-18T05:08:21.8383794Z 2022-05-18T05:08:21.8383933Z Generating XML reports... 2022-05-18T05:08:21.8432465Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops/TEST-TestTensorOps-20220518050815.xml 2022-05-18T05:08:22.1170361Z Running distributed/_shard/sharded_tensor/ops/test_linear ... [2022-05-18 05:08:22.116516] 2022-05-18T05:08:22.1171154Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_linear.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:22.116622] 2022-05-18T05:08:23.0201764Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear 2022-05-18T05:08:23.0217806Z 2022-05-18T05:08:23.0218205Z Running tests... 2022-05-18T05:08:23.0218719Z ---------------------------------------------------------------------- 2022-05-18T05:08:24.6078709Z test_sharded_linear_colwise (__main__.TestShardedTensorOpsLinear) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:24.6489760Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110416 2022-05-18T05:08:24.6605238Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110417 2022-05-18T05:08:24.6720475Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 110418 2022-05-18T05:08:24.6838863Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 110419 2022-05-18T05:08:25.5970165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:25.6205520Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:25.6517006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:25.6639546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:25.8885352Z skip: Need at least 4 CUDA devices (2.866s) 2022-05-18T05:08:25.9057110Z test_sharded_linear_errors (__main__.TestShardedTensorOpsLinear) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110552 2022-05-18T05:08:25.9168361Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110553 2022-05-18T05:08:25.9283422Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 110554 2022-05-18T05:08:25.9399468Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 110555 2022-05-18T05:08:26.9174286Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:26.9258865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:26.9638358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:26.9905745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:27.1442296Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:08:27.1591056Z test_sharded_linear_rowwise (__main__.TestShardedTensorOpsLinear) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110688 2022-05-18T05:08:27.1701232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110689 2022-05-18T05:08:27.1815862Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 110690 2022-05-18T05:08:27.1932849Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 110691 2022-05-18T05:08:28.0793611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:28.0854916Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:28.0928403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:28.1392988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:28.2972426Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:08:28.2972679Z 2022-05-18T05:08:28.2973063Z ---------------------------------------------------------------------- 2022-05-18T05:08:28.2973414Z Ran 3 tests in 5.275s 2022-05-18T05:08:28.2973583Z 2022-05-18T05:08:28.2973697Z OK (skipped=3) 2022-05-18T05:08:28.2973862Z 2022-05-18T05:08:28.2973991Z Generating XML reports... 2022-05-18T05:08:28.3020853Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear/TEST-TestShardedTensorOpsLinear-20220518050823.xml 2022-05-18T05:08:28.5722746Z Running distributed/elastic/timer/local_timer_test ... [2022-05-18 05:08:28.571807] 2022-05-18T05:08:28.5723774Z Executing ['/opt/conda/bin/python', 'distributed/elastic/timer/local_timer_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:28.571910] 2022-05-18T05:08:29.4466652Z Test results will be stored in test-reports/python-unittest/distributed.elastic.timer.local_timer_test 2022-05-18T05:08:29.4486659Z 2022-05-18T05:08:29.4486984Z Running tests... 2022-05-18T05:08:29.4487401Z ---------------------------------------------------------------------- 2022-05-18T05:08:29.4497248Z test_acquire_release (__main__.LocalTimerServerTest) 2022-05-18T05:08:31.0480894Z tests that: ... ok (1.599s) 2022-05-18T05:08:31.0488429Z test_expired_timers (__main__.LocalTimerServerTest) 2022-05-18T05:08:31.0507333Z tests that a single expired timer on a process should terminate ... ok (0.003s) 2022-05-18T05:08:31.0520552Z test_valid_timers (__main__.LocalTimerServerTest) 2022-05-18T05:08:31.0538273Z tests that valid timers are processed correctly and the process is left alone ... ok (0.003s) 2022-05-18T05:08:31.0547168Z test_watchdog_call_count (__main__.LocalTimerServerTest) 2022-05-18T05:08:31.1576938Z checks that the watchdog function ran wait/interval +- 1 times ... ok (0.104s) 2022-05-18T05:08:31.1579649Z test_watchdog_empty_queue (__main__.LocalTimerServerTest) 2022-05-18T05:08:31.1687192Z checks that the watchdog can run on an empty queue ... ok (0.011s) 2022-05-18T05:08:31.1877896Z test_client_interaction (__main__.LocalTimerTest) ... ok (0.019s) 2022-05-18T05:08:31.2000619Z test_exception_propagation (__main__.LocalTimerTest) ... ok (0.012s) 2022-05-18T05:08:31.2010217Z test_get_timer_recursive (__main__.LocalTimerTest) 2022-05-18T05:08:32.5559758Z If a function acquires a countdown timer with default scope, ... ok (1.356s) 2022-05-18T05:08:32.6599841Z test_happy_path (__main__.LocalTimerTest) ... ok (0.104s) 2022-05-18T05:08:32.6714239Z test_no_client (__main__.LocalTimerTest) ... ok (0.011s) 2022-05-18T05:08:32.8203912Z test_timer (__main__.LocalTimerTest) ... ok (0.149s) 2022-05-18T05:08:32.8435355Z test_get (__main__.MultiprocessingRequestQueueTest) ... ok (0.023s) 2022-05-18T05:08:32.8444601Z test_get_less_than_size (__main__.MultiprocessingRequestQueueTest) 2022-05-18T05:08:33.3576833Z Tests slow producer. ... ok (0.514s) 2022-05-18T05:08:33.3593898Z test_get_size (__main__.MultiprocessingRequestQueueTest) 2022-05-18T05:08:34.2754893Z Creates a "producer" process that enqueues ``n`` elements ... ok (0.917s) 2022-05-18T05:08:34.2759124Z 2022-05-18T05:08:34.2759723Z ---------------------------------------------------------------------- 2022-05-18T05:08:34.2760074Z Ran 14 tests in 4.827s 2022-05-18T05:08:34.2760250Z 2022-05-18T05:08:34.2760348Z OK 2022-05-18T05:08:34.2760488Z 2022-05-18T05:08:34.2760620Z Generating XML reports... 2022-05-18T05:08:34.2822797Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20220518050829.xml 2022-05-18T05:08:34.2831927Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20220518050829.xml 2022-05-18T05:08:34.2838119Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20220518050829.xml 2022-05-18T05:08:34.6498166Z Running distributed/fsdp/test_fsdp_uneven ... [2022-05-18 05:08:34.649327] 2022-05-18T05:08:34.6498926Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_uneven.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:34.649432] 2022-05-18T05:08:35.6049054Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven 2022-05-18T05:08:35.6066636Z 2022-05-18T05:08:35.6066781Z Running tests... 2022-05-18T05:08:35.6067638Z ---------------------------------------------------------------------- 2022-05-18T05:08:35.6081536Z test_one_iteration (__main__.TestUnevenParamShard) 2022-05-18T05:08:37.2060923Z Test FSDP with uneven divide of parameter shards. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:37.2460190Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110949 2022-05-18T05:08:37.2577117Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110950 2022-05-18T05:08:38.1509862Z dist init r=1, world=2 2022-05-18T05:08:38.1515194Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:08:38.1636726Z dist init r=0, world=2 2022-05-18T05:08:38.1642485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:38.1643809Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:38.1720918Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:39.5328131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:39.5328677Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:40.0657011Z ok (4.459s) 2022-05-18T05:08:40.0657311Z 2022-05-18T05:08:40.0658009Z ---------------------------------------------------------------------- 2022-05-18T05:08:40.0658473Z Ran 1 test in 4.459s 2022-05-18T05:08:40.0658645Z 2022-05-18T05:08:40.0658748Z OK 2022-05-18T05:08:40.0658866Z 2022-05-18T05:08:40.0658999Z Generating XML reports... 2022-05-18T05:08:40.0716642Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven/TEST-TestUnevenParamShard-20220518050835.xml 2022-05-18T05:08:40.3506070Z Running distributed/fsdp/test_fsdp_pure_fp16 ... [2022-05-18 05:08:40.350143] 2022-05-18T05:08:40.3506809Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_pure_fp16.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:40.350249] 2022-05-18T05:08:41.2779194Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16 2022-05-18T05:08:41.2797126Z 2022-05-18T05:08:41.2797295Z Running tests... 2022-05-18T05:08:41.2797980Z ---------------------------------------------------------------------- 2022-05-18T05:08:42.8575139Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=False) (__main__.TestPureFP16) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:42.8718407Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/73315 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (1.592s) 2022-05-18T05:08:42.8985363Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=True) (__main__.TestPureFP16) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111067 2022-05-18T05:08:42.9096663Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111068 2022-05-18T05:08:43.8248205Z dist init r=0, world=2 2022-05-18T05:08:43.8251774Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:43.8570432Z dist init r=1, world=2 2022-05-18T05:08:43.8574686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:08:43.8575811Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:43.8660324Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:45.2022825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:45.2023379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:45.7176374Z ok (2.846s) 2022-05-18T05:08:45.7176603Z 2022-05-18T05:08:45.7177015Z ---------------------------------------------------------------------- 2022-05-18T05:08:45.7177345Z Ran 2 tests in 4.438s 2022-05-18T05:08:45.7177513Z 2022-05-18T05:08:45.7177639Z OK (skipped=1) 2022-05-18T05:08:45.7177818Z 2022-05-18T05:08:45.7177947Z Generating XML reports... 2022-05-18T05:08:45.7235829Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20220518050841.xml 2022-05-18T05:08:45.9927389Z Running distributed/fsdp/test_fsdp_traversal ... [2022-05-18 05:08:45.992178] 2022-05-18T05:08:45.9928700Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_traversal.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:45.992285] 2022-05-18T05:08:46.9214652Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_traversal 2022-05-18T05:08:46.9231891Z 2022-05-18T05:08:46.9232044Z Running tests... 2022-05-18T05:08:46.9232470Z ---------------------------------------------------------------------- 2022-05-18T05:08:48.5038785Z test_fsdp_modules (__main__.TestTraversal) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:48.5440077Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111185 2022-05-18T05:08:48.5552957Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111186 2022-05-18T05:08:49.4718083Z dist init r=1, world=2 2022-05-18T05:08:49.4722501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:08:49.4738390Z dist init r=0, world=2 2022-05-18T05:08:49.4744434Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:49.4745259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:49.4825834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:50.8231932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:50.8232440Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:50.8445977Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:08:50.8446531Z warnings.warn( 2022-05-18T05:08:50.8481819Z /opt/conda/lib/python3.9/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:911: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:08:50.8482372Z warnings.warn( 2022-05-18T05:08:51.1629729Z ok (4.239s) 2022-05-18T05:08:51.1629942Z 2022-05-18T05:08:51.1630350Z ---------------------------------------------------------------------- 2022-05-18T05:08:51.1630676Z Ran 1 test in 4.240s 2022-05-18T05:08:51.1630844Z 2022-05-18T05:08:51.1630940Z OK 2022-05-18T05:08:51.1631076Z 2022-05-18T05:08:51.1631218Z Generating XML reports... 2022-05-18T05:08:51.1685546Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_traversal/TEST-TestTraversal-20220518050846.xml 2022-05-18T05:08:51.4424732Z Running distributed/_shard/sharded_tensor/ops/test_embedding ... [2022-05-18 05:08:51.441942] 2022-05-18T05:08:51.4425532Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_embedding.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:51.442043] 2022-05-18T05:08:52.3550197Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding 2022-05-18T05:08:52.3567125Z 2022-05-18T05:08:52.3567610Z Running tests... 2022-05-18T05:08:52.3568117Z ---------------------------------------------------------------------- 2022-05-18T05:08:53.9400186Z test_sharded_embedding_colwise (__main__.TestShardedEmbedding) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:53.9810562Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111299 2022-05-18T05:08:53.9928438Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111300 2022-05-18T05:08:54.0047259Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111301 2022-05-18T05:08:54.0166351Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111302 2022-05-18T05:08:54.9722209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:55.0070354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:55.0208015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:55.0319487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:55.2215313Z skip: Need at least 4 CUDA devices (2.864s) 2022-05-18T05:08:55.2368657Z test_sharded_embedding_rowwise (__main__.TestShardedEmbedding) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111435 2022-05-18T05:08:55.2475630Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111436 2022-05-18T05:08:55.2590210Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111437 2022-05-18T05:08:55.2705523Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111438 2022-05-18T05:08:56.2059418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:56.2268901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:56.2371135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:08:56.2410748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:08:56.4748532Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:08:56.4748790Z 2022-05-18T05:08:56.4749192Z ---------------------------------------------------------------------- 2022-05-18T05:08:56.4749856Z Ran 2 tests in 4.118s 2022-05-18T05:08:56.4750032Z 2022-05-18T05:08:56.4750149Z OK (skipped=2) 2022-05-18T05:08:56.4750310Z 2022-05-18T05:08:56.4750441Z Generating XML reports... 2022-05-18T05:08:56.4801363Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding/TEST-TestShardedEmbedding-20220518050852.xml 2022-05-18T05:08:56.7563897Z Running distributed/_shard/sharded_tensor/ops/test_chunk ... [2022-05-18 05:08:56.755925] 2022-05-18T05:08:56.7564686Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_chunk.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:08:56.756025] 2022-05-18T05:08:57.6633090Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_chunk 2022-05-18T05:08:57.6650110Z 2022-05-18T05:08:57.6650324Z Running tests... 2022-05-18T05:08:57.6650907Z ---------------------------------------------------------------------- 2022-05-18T05:08:59.2503612Z test_sharded_chunk (__main__.TestShardedTensorChunkOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:59.2902596Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111606 2022-05-18T05:08:59.3015910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111607 2022-05-18T05:08:59.3131005Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111608 2022-05-18T05:08:59.3248183Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111609 2022-05-18T05:09:00.2046614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:09:00.2085508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:00.2358220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:00.2924621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:09:00.4292899Z skip: Need at least 4 CUDA devices (2.764s) 2022-05-18T05:09:00.4452075Z test_sharded_chunk_error (__main__.TestShardedTensorChunkOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111742 2022-05-18T05:09:00.4562969Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111743 2022-05-18T05:09:00.4676406Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111744 2022-05-18T05:09:00.4796978Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111745 2022-05-18T05:09:01.3844354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:01.4278505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:01.4544784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:09:01.4599755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:09:01.6839873Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:09:01.6840997Z 2022-05-18T05:09:01.6841423Z ---------------------------------------------------------------------- 2022-05-18T05:09:01.6841780Z Ran 2 tests in 4.019s 2022-05-18T05:09:01.6841936Z 2022-05-18T05:09:01.6842056Z OK (skipped=2) 2022-05-18T05:09:01.6842214Z 2022-05-18T05:09:01.6842342Z Generating XML reports... 2022-05-18T05:09:01.6886771Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_chunk/TEST-TestShardedTensorChunkOps-20220518050857.xml 2022-05-18T05:09:01.9675578Z Running distributed/_shard/sharded_tensor/ops/test_softmax ... [2022-05-18 05:09:01.966985] 2022-05-18T05:09:01.9676946Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_softmax.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:01.967090] 2022-05-18T05:09:02.8534596Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpawrfw0h_ 2022-05-18T05:09:02.8535663Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpawrfw0h_/_remote_module_non_scriptable.py 2022-05-18T05:09:02.8694966Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax 2022-05-18T05:09:02.8713936Z 2022-05-18T05:09:02.8714356Z Running tests... 2022-05-18T05:09:02.8714884Z ---------------------------------------------------------------------- 2022-05-18T05:09:04.4583356Z test_sharded_softmax_basic (__main__.TestShardedSoftmax) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:04.4994409Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111913 2022-05-18T05:09:04.5109329Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111914 2022-05-18T05:09:04.5227492Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111915 2022-05-18T05:09:04.5347682Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111916 2022-05-18T05:09:05.4465142Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvkiy63k8 2022-05-18T05:09:05.4466168Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvkiy63k8/_remote_module_non_scriptable.py 2022-05-18T05:09:05.4595216Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:05.4733974Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxihluic_ 2022-05-18T05:09:05.4736569Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxihluic_/_remote_module_non_scriptable.py 2022-05-18T05:09:05.4869471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:09:05.4888430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppeh8d5tq 2022-05-18T05:09:05.4890994Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp45t6bpek 2022-05-18T05:09:05.4891711Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppeh8d5tq/_remote_module_non_scriptable.py 2022-05-18T05:09:05.4893773Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp45t6bpek/_remote_module_non_scriptable.py 2022-05-18T05:09:05.5027830Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:05.5031668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:09:05.7395106Z skip: Need at least 4 CUDA devices (2.868s) 2022-05-18T05:09:05.7533755Z test_sharded_softmax_on_sharding_dim (__main__.TestShardedSoftmax) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112049 2022-05-18T05:09:05.7645699Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112050 2022-05-18T05:09:05.7761784Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 112051 2022-05-18T05:09:05.7879958Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 112052 2022-05-18T05:09:06.7616067Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpou63byp7 2022-05-18T05:09:06.7617159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpou63byp7/_remote_module_non_scriptable.py 2022-05-18T05:09:06.7690445Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk1d198mj 2022-05-18T05:09:06.7692969Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk1d198mj/_remote_module_non_scriptable.py 2022-05-18T05:09:06.7745732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:06.7756572Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa6qnjpr7 2022-05-18T05:09:06.7760049Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa6qnjpr7/_remote_module_non_scriptable.py 2022-05-18T05:09:06.7822439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:06.7896802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:09:06.8053729Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeyed7vl6 2022-05-18T05:09:06.8055025Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeyed7vl6/_remote_module_non_scriptable.py 2022-05-18T05:09:06.8196560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:09:06.9922203Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:09:06.9922466Z 2022-05-18T05:09:06.9922846Z ---------------------------------------------------------------------- 2022-05-18T05:09:06.9923175Z Ran 2 tests in 4.121s 2022-05-18T05:09:06.9923339Z 2022-05-18T05:09:06.9923448Z OK (skipped=2) 2022-05-18T05:09:06.9923604Z 2022-05-18T05:09:06.9923732Z Generating XML reports... 2022-05-18T05:09:06.9968070Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20220518050902.xml 2022-05-18T05:09:07.2636350Z Running distributed/test_data_parallel ... [2022-05-18 05:09:07.263151] 2022-05-18T05:09:07.2637078Z Executing ['/opt/conda/bin/python', 'distributed/test_data_parallel.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:07.263252] 2022-05-18T05:09:09.7629223Z Test results will be stored in test-reports/python-unittest/distributed.test_data_parallel 2022-05-18T05:09:09.7651727Z 2022-05-18T05:09:09.7651964Z Running tests... 2022-05-18T05:09:09.7652838Z ---------------------------------------------------------------------- 2022-05-18T05:09:11.1289010Z test_autocast (__main__.TestDataParallel) ... ok (1.363s) 2022-05-18T05:09:11.1734678Z test_data_parallel (__main__.TestDataParallel) ... ok (0.044s) 2022-05-18T05:09:11.1856456Z test_data_parallel_buffers_requiring_grad (__main__.TestDataParallel) ... ok (0.012s) 2022-05-18T05:09:11.2159841Z test_data_parallel_complex (__main__.TestDataParallel) ... ok (0.030s) 2022-05-18T05:09:11.2222039Z test_data_parallel_device_args (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:09:11.2282160Z test_data_parallel_function_deletion (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:09:11.2296595Z test_data_parallel_lazy_linear (__main__.TestDataParallel) ... /opt/conda/lib/python3.9/site-packages/torch/nn/modules/lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-05-18T05:09:11.2297457Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-05-18T05:09:11.2306736Z ok (0.002s) 2022-05-18T05:09:11.2346753Z test_data_parallel_model_device (__main__.TestDataParallel) 2022-05-18T05:09:11.2643629Z Test device[0] check at forward time. ... ok (0.034s) 2022-05-18T05:09:11.3180201Z test_data_parallel_model_no_refcycles (__main__.TestDataParallel) ... ok (0.053s) 2022-05-18T05:09:11.3228209Z test_data_parallel_module_zero_inputs (__main__.TestDataParallel) ... ok (0.005s) 2022-05-18T05:09:11.3288786Z test_data_parallel_multiple_input (__main__.TestDataParallel) ... /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/comm.py:231: UserWarning: Using -1 to represent CPU tensor is deprecated. Please use a device object or string instead, e.g., "cpu". 2022-05-18T05:09:11.3289387Z warnings.warn( 2022-05-18T05:09:11.3448221Z ok (0.022s) 2022-05-18T05:09:11.3478752Z test_data_parallel_nested_input (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:09:11.3539618Z test_data_parallel_nested_output (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:09:11.3578693Z test_data_parallel_no_grad (__main__.TestDataParallel) ... ok (0.004s) 2022-05-18T05:09:12.6580867Z test_data_parallel_rnn (__main__.TestDataParallel) ... ok (1.300s) 2022-05-18T05:09:12.6612700Z test_data_parallel_small_back (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:09:12.6729817Z test_data_parallel_sparse (__main__.TestDataParallel) ... ok (0.012s) 2022-05-18T05:09:12.6955431Z test_gather_cpu (__main__.TestDataParallel) ... /opt/conda/lib/python3.9/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. 2022-05-18T05:09:12.6956164Z warnings.warn('Was asked to gather along dimension 0, but all ' 2022-05-18T05:09:12.7169398Z ok (0.044s) 2022-05-18T05:09:12.7181396Z test_gather_different_len_dicts (__main__.TestDataParallel) ... ok (0.001s) 2022-05-18T05:09:12.7622610Z test_gather_gpu (__main__.TestDataParallel) ... ok (0.044s) 2022-05-18T05:09:12.7675251Z test_parallel_apply (__main__.TestDataParallel) ... ok (0.005s) 2022-05-18T05:09:12.7731527Z test_parallel_apply_autocast (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:09:12.7753378Z test_parallel_apply_passes_exception (__main__.TestDataParallel) ... ok (0.002s) 2022-05-18T05:09:12.7829367Z test_parameter_list_dict_replica (__main__.TestDataParallel) ... ok (0.007s) 2022-05-18T05:09:12.7873152Z test_replicate (__main__.TestDataParallel) ... ok (0.004s) 2022-05-18T05:09:12.7906569Z test_replicate_buffers (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:09:12.7939290Z test_save_replica_module (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:09:12.8122984Z test_scatter_cpu (__main__.TestDataParallel) ... ok (0.018s) 2022-05-18T05:09:12.8311967Z test_scatter_gpu (__main__.TestDataParallel) ... ok (0.019s) 2022-05-18T05:09:13.0199929Z test_strided_grad_layout (__main__.TestDataParallel) ... ok (0.189s) 2022-05-18T05:09:13.0262498Z test_zero_grad (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:09:13.0316470Z test_data_parallel_module_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.005s) 2022-05-18T05:09:13.0368146Z test_data_parallel_module_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.005s) 2022-05-18T05:09:13.0416368Z test_data_parallel_module_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.005s) 2022-05-18T05:09:13.1837226Z test_data_parallel_module_kwargs_only_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.142s) 2022-05-18T05:09:13.2158174Z test_data_parallel_module_kwargs_only_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.032s) 2022-05-18T05:09:13.2452916Z test_data_parallel_module_kwargs_only_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.029s) 2022-05-18T05:09:13.2750527Z test_data_parallel_module_kwargs_only_empty_dict_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.030s) 2022-05-18T05:09:13.3048079Z test_data_parallel_module_kwargs_only_empty_dict_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.030s) 2022-05-18T05:09:13.3341813Z test_data_parallel_module_kwargs_only_empty_dict_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.029s) 2022-05-18T05:09:13.3638263Z test_data_parallel_module_kwargs_only_empty_list_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.030s) 2022-05-18T05:09:13.3652968Z test_data_parallel_module_kwargs_only_empty_list_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/73923 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-05-18T05:09:13.3949903Z test_data_parallel_module_kwargs_only_empty_list_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.030s) 2022-05-18T05:09:13.4257249Z test_data_parallel_module_kwargs_only_empty_tuple_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.031s) 2022-05-18T05:09:13.4559086Z test_data_parallel_module_kwargs_only_empty_tuple_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.030s) 2022-05-18T05:09:13.4854739Z test_data_parallel_module_kwargs_only_empty_tuple_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.029s) 2022-05-18T05:09:13.4855235Z 2022-05-18T05:09:13.4855673Z ---------------------------------------------------------------------- 2022-05-18T05:09:13.4856014Z Ran 46 tests in 3.720s 2022-05-18T05:09:13.4856176Z 2022-05-18T05:09:13.4856267Z OK (skipped=1) 2022-05-18T05:09:13.4856424Z 2022-05-18T05:09:13.4856546Z Generating XML reports... 2022-05-18T05:09:13.4919302Z Generated XML report: test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallel-20220518050909.xml 2022-05-18T05:09:13.4935587Z Generated XML report: test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallelDeviceTypeCUDA-20220518050909.xml 2022-05-18T05:09:13.8715798Z Running distributed/fsdp/test_flatten_params_wrapper ... [2022-05-18 05:09:13.871127] 2022-05-18T05:09:13.8716564Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_flatten_params_wrapper.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:13.871234] 2022-05-18T05:09:14.8010292Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper 2022-05-18T05:09:14.8029665Z 2022-05-18T05:09:14.8030074Z Running tests... 2022-05-18T05:09:14.8030543Z ---------------------------------------------------------------------- 2022-05-18T05:09:16.4102960Z test_empty_module (__main__.TestFlattenParams) ... ok (1.607s) 2022-05-18T05:09:16.4207248Z test_flatten_nothing (__main__.TestFlattenParams) ... ok (0.010s) 2022-05-18T05:09:16.4326795Z test_num_params (__main__.TestFlattenParams) ... ok (0.011s) 2022-05-18T05:09:16.4592039Z test_output (__main__.TestFlattenParams) ... ok (0.027s) 2022-05-18T05:09:16.4722976Z test_partial_flattening (__main__.TestFlattenParams) ... ok (0.013s) 2022-05-18T05:09:16.4834594Z test_sharded_flat_param (__main__.TestFlattenParams) ... ok (0.011s) 2022-05-18T05:09:16.4942815Z test_shared_params_num_params (__main__.TestFlattenParams) ... ok (0.011s) 2022-05-18T05:09:16.5171734Z test_shared_params_output (__main__.TestFlattenParams) ... ok (0.023s) 2022-05-18T05:09:16.5615225Z test_shared_params_pnorm_after_step (__main__.TestFlattenParams) ... ok (0.044s) 2022-05-18T05:09:16.5630583Z test_empty_module (__main__.TestFlattenParamsCUDA) ... ok (0.001s) 2022-05-18T05:09:16.5746503Z test_flatten_nothing (__main__.TestFlattenParamsCUDA) ... ok (0.012s) 2022-05-18T05:09:16.5878177Z test_num_params (__main__.TestFlattenParamsCUDA) ... ok (0.013s) 2022-05-18T05:09:16.7786030Z test_output (__main__.TestFlattenParamsCUDA) ... ok (0.190s) 2022-05-18T05:09:16.7951335Z test_partial_flattening (__main__.TestFlattenParamsCUDA) ... ok (0.017s) 2022-05-18T05:09:16.8059255Z test_sharded_flat_param (__main__.TestFlattenParamsCUDA) ... ok (0.011s) 2022-05-18T05:09:16.8192830Z test_shared_params_num_params (__main__.TestFlattenParamsCUDA) ... ok (0.013s) 2022-05-18T05:09:16.8440081Z test_shared_params_output (__main__.TestFlattenParamsCUDA) ... ok (0.025s) 2022-05-18T05:09:16.8963733Z test_shared_params_pnorm_after_step (__main__.TestFlattenParamsCUDA) ... ok (0.052s) 2022-05-18T05:09:16.8979422Z test_empty_module (__main__.TestFlattenParamsCUDAHalf) ... ok (0.001s) 2022-05-18T05:09:16.9111337Z test_flatten_nothing (__main__.TestFlattenParamsCUDAHalf) ... ok (0.013s) 2022-05-18T05:09:16.9269210Z test_num_params (__main__.TestFlattenParamsCUDAHalf) ... ok (0.016s) 2022-05-18T05:09:16.9540545Z test_output (__main__.TestFlattenParamsCUDAHalf) ... ok (0.027s) 2022-05-18T05:09:16.9724901Z test_partial_flattening (__main__.TestFlattenParamsCUDAHalf) ... ok (0.018s) 2022-05-18T05:09:16.9831328Z test_sharded_flat_param (__main__.TestFlattenParamsCUDAHalf) ... ok (0.011s) 2022-05-18T05:09:16.9984952Z test_shared_params_num_params (__main__.TestFlattenParamsCUDAHalf) ... ok (0.015s) 2022-05-18T05:09:17.0255817Z test_shared_params_output (__main__.TestFlattenParamsCUDAHalf) ... ok (0.027s) 2022-05-18T05:09:17.0815420Z test_shared_params_pnorm_after_step (__main__.TestFlattenParamsCUDAHalf) ... ok (0.056s) 2022-05-18T05:09:17.0815957Z 2022-05-18T05:09:17.0816390Z ---------------------------------------------------------------------- 2022-05-18T05:09:17.0816732Z Ran 27 tests in 2.279s 2022-05-18T05:09:17.0816898Z 2022-05-18T05:09:17.0818384Z OK 2022-05-18T05:09:17.0818574Z 2022-05-18T05:09:17.0818905Z Generating XML reports... 2022-05-18T05:09:17.0859670Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper/TEST-TestFlattenParams-20220518050914.xml 2022-05-18T05:09:17.0871416Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper/TEST-TestFlattenParamsCUDA-20220518050914.xml 2022-05-18T05:09:17.0882225Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper/TEST-TestFlattenParamsCUDAHalf-20220518050914.xml 2022-05-18T05:09:17.3518061Z Running distributed/elastic/utils/logging_test ... [2022-05-18 05:09:17.351327] 2022-05-18T05:09:17.3518793Z Executing ['/opt/conda/bin/python', 'distributed/elastic/utils/logging_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:17.351429] 2022-05-18T05:09:18.2724424Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.logging_test 2022-05-18T05:09:18.2740165Z 2022-05-18T05:09:18.2740408Z Running tests... 2022-05-18T05:09:18.2740840Z ---------------------------------------------------------------------- 2022-05-18T05:09:19.8481602Z test_derive_module_name (__main__.LoggingTest) ... ok (1.574s) 2022-05-18T05:09:19.8503070Z test_logger_name (__main__.LoggingTest) ... ok (0.002s) 2022-05-18T05:09:19.8503306Z 2022-05-18T05:09:19.8503772Z ---------------------------------------------------------------------- 2022-05-18T05:09:19.8504527Z Ran 2 tests in 1.576s 2022-05-18T05:09:19.8504700Z 2022-05-18T05:09:19.8504816Z OK 2022-05-18T05:09:19.8504953Z 2022-05-18T05:09:19.8505082Z Generating XML reports... 2022-05-18T05:09:19.8536286Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.logging_test/TEST-LoggingTest-20220518050918.xml 2022-05-18T05:09:20.0820244Z Running distributed/elastic/metrics/api_test ... [2022-05-18 05:09:20.081548] 2022-05-18T05:09:20.0820989Z Executing ['/opt/conda/bin/python', 'distributed/elastic/metrics/api_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:20.081651] 2022-05-18T05:09:20.9534074Z Test results will be stored in test-reports/python-unittest/distributed.elastic.metrics.api_test 2022-05-18T05:09:20.9549944Z 2022-05-18T05:09:20.9550182Z Running tests... 2022-05-18T05:09:20.9550605Z ---------------------------------------------------------------------- 2022-05-18T05:09:22.5662503Z test_get_metric_name (__main__.MetricsApiTest) ... ok (1.611s) 2022-05-18T05:09:22.5676650Z test_inheritance (__main__.MetricsApiTest) ... ok (0.001s) 2022-05-18T05:09:22.5697452Z test_profile (__main__.MetricsApiTest) ... ok (0.002s) 2022-05-18T05:09:22.5698285Z 2022-05-18T05:09:22.5698809Z ---------------------------------------------------------------------- 2022-05-18T05:09:22.5699549Z Ran 3 tests in 1.615s 2022-05-18T05:09:22.5699885Z 2022-05-18T05:09:22.5700038Z OK 2022-05-18T05:09:22.5700190Z 2022-05-18T05:09:22.5700320Z Generating XML reports... 2022-05-18T05:09:22.5735374Z Generated XML report: test-reports/python-unittest/distributed.elastic.metrics.api_test/TEST-MetricsApiTest-20220518050920.xml 2022-05-18T05:09:22.8168070Z Running distributed/test_nccl ... [2022-05-18 05:09:22.816257] 2022-05-18T05:09:22.8168879Z Executing ['/opt/conda/bin/python', 'distributed/test_nccl.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:22.816361] 2022-05-18T05:09:25.2912623Z Test results will be stored in test-reports/python-unittest/distributed.test_nccl 2022-05-18T05:09:25.2933623Z 2022-05-18T05:09:25.2934083Z Running tests... 2022-05-18T05:09:25.2934535Z ---------------------------------------------------------------------- 2022-05-18T05:09:26.3699496Z test_all_gather_cuda_float32 (__main__.TestNCCLCUDA) ... ok (1.076s) 2022-05-18T05:09:26.3883064Z test_all_reduce_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.018s) 2022-05-18T05:09:26.3926974Z test_broadcast_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.004s) 2022-05-18T05:09:26.3956416Z test_collective_errors_cuda (__main__.TestNCCLCUDA) ... ok (0.003s) 2022-05-18T05:09:26.3992571Z test_reduce_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.004s) 2022-05-18T05:09:26.4041521Z test_reduce_scatter_cuda_float32 (__main__.TestNCCLCUDA) ... ok (0.005s) 2022-05-18T05:09:26.4062058Z test_unique_id_cuda (__main__.TestNCCLCUDA) ... ok (0.002s) 2022-05-18T05:09:26.4062781Z 2022-05-18T05:09:26.4063163Z ---------------------------------------------------------------------- 2022-05-18T05:09:26.4063521Z Ran 7 tests in 1.113s 2022-05-18T05:09:26.4063949Z 2022-05-18T05:09:26.4064047Z OK 2022-05-18T05:09:26.4064187Z 2022-05-18T05:09:26.4064323Z Generating XML reports... 2022-05-18T05:09:26.4103127Z Generated XML report: test-reports/python-unittest/distributed.test_nccl/TEST-TestNCCLCUDA-20220518050925.xml 2022-05-18T05:09:26.7051837Z Running distributed/_shard/sharded_tensor/ops/test_math_ops ... [2022-05-18 05:09:26.704648] 2022-05-18T05:09:26.7052640Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_math_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:26.704753] 2022-05-18T05:09:27.7204699Z Running distributed/_shard/test_replicated_tensor ... [2022-05-18 05:09:27.719976] 2022-05-18T05:09:27.7205476Z Executing ['/opt/conda/bin/python', 'distributed/_shard/test_replicated_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:27.720076] 2022-05-18T05:09:28.7349813Z Running distributed/elastic/events/lib_test ... [2022-05-18 05:09:28.734503] 2022-05-18T05:09:28.7350495Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/elastic/events/lib_test.py', '-v'] ... [2022-05-18 05:09:28.734605] 2022-05-18T05:09:29.5318770Z ============================= test session starts ============================== 2022-05-18T05:09:29.5319342Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:29.5371503Z cachedir: .pytest_cache 2022-05-18T05:09:29.5372093Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:29.5372834Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:29.5373240Z plugins: hypothesis-4.53.2 2022-05-18T05:09:30.3100573Z collecting ...  2022-05-18T05:09:30.3112377Z collecting 3 items  2022-05-18T05:09:30.3112838Z collected 8 items  2022-05-18T05:09:30.3117352Z 2022-05-18T05:09:30.3134577Z distributed/elastic/events/lib_test.py::EventLibTest::test_event_created PASSED [ 12%] 2022-05-18T05:09:30.3149908Z distributed/elastic/events/lib_test.py::EventLibTest::test_event_deser PASSED [ 25%] 2022-05-18T05:09:30.3168312Z distributed/elastic/events/lib_test.py::EventLibTest::test_get_or_create_logger PASSED [ 37%] 2022-05-18T05:09:30.3859836Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event PASSED [ 50%] 2022-05-18T05:09:30.3879282Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event_does_not_run_if_invalid_dest PASSED [ 62%] 2022-05-18T05:09:30.3892198Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_created PASSED [ 75%] 2022-05-18T05:09:30.3906427Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_deserialize PASSED [ 87%] 2022-05-18T05:09:30.3926797Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_str PASSED [100%] 2022-05-18T05:09:30.3930171Z 2022-05-18T05:09:30.3930758Z ============================== 8 passed in 0.86s =============================== 2022-05-18T05:09:30.5456247Z Running distributed/fsdp/test_shard_utils ... [2022-05-18 05:09:30.545161] 2022-05-18T05:09:30.5457048Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_shard_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:09:30.545270] 2022-05-18T05:09:31.5764519Z Running distributed/pipeline/sync/skip/test_gpipe ... [2022-05-18 05:09:31.576009] 2022-05-18T05:09:31.5765233Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_gpipe.py', '-v'] ... [2022-05-18 05:09:31.576112] 2022-05-18T05:09:32.8185939Z ============================= test session starts ============================== 2022-05-18T05:09:32.8205993Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:32.8206368Z cachedir: .pytest_cache 2022-05-18T05:09:32.8206940Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:32.8207386Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:32.8207720Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:32.8208083Z plugins: hypothesis-4.53.2 2022-05-18T05:09:32.8488619Z collecting ...  2022-05-18T05:09:32.8489078Z collected 13 items  2022-05-18T05:09:32.8493563Z 2022-05-18T05:09:34.7619255Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-3] PASSED [ 7%] 2022-05-18T05:09:36.2251027Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:2] PASSED [ 15%] 2022-05-18T05:09:36.2855993Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-2:1] PASSED [ 23%] 2022-05-18T05:09:36.3005742Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:1:1] SKIPPED [ 30%] 2022-05-18T05:09:36.3761573Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-3] PASSED [ 38%] 2022-05-18T05:09:36.4388166Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:2] PASSED [ 46%] 2022-05-18T05:09:36.5004567Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-2:1] PASSED [ 53%] 2022-05-18T05:09:36.5154394Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:1:1] SKIPPED [ 61%] 2022-05-18T05:09:36.5798924Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-3] PASSED [ 69%] 2022-05-18T05:09:36.6391503Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:2] PASSED [ 76%] 2022-05-18T05:09:36.6969626Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-2:1] PASSED [ 84%] 2022-05-18T05:09:36.7118183Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:1:1] SKIPPED [ 92%] 2022-05-18T05:09:36.7382109Z distributed/pipeline/sync/skip/test_gpipe.py::test_none_skip PASSED [100%] 2022-05-18T05:09:36.7386373Z 2022-05-18T05:09:36.7386911Z =========================== short test summary info ============================ 2022-05-18T05:09:36.7387387Z SKIPPED [3] distributed/pipeline/sync/skip/test_gpipe.py:24: at least 3 cuda devices required 2022-05-18T05:09:36.7388024Z ======================== 10 passed, 3 skipped in 3.92s ========================= 2022-05-18T05:09:37.0632271Z Running distributed/pipeline/sync/skip/test_leak ... [2022-05-18 05:09:37.062685] 2022-05-18T05:09:37.0632936Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_leak.py', '-v'] ... [2022-05-18 05:09:37.062793] 2022-05-18T05:09:38.3225877Z ============================= test session starts ============================== 2022-05-18T05:09:38.3226428Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:38.3245992Z cachedir: .pytest_cache 2022-05-18T05:09:38.3246580Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:38.3247038Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:38.3247393Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:38.3247778Z plugins: hypothesis-4.53.2 2022-05-18T05:09:38.3426901Z collecting ...  2022-05-18T05:09:38.3427319Z collected 8 items  2022-05-18T05:09:38.3431472Z 2022-05-18T05:09:38.4369066Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-train] PASSED [ 12%] 2022-05-18T05:09:38.4551921Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-eval] PASSED [ 25%] 2022-05-18T05:09:38.4759097Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-train] PASSED [ 37%] 2022-05-18T05:09:38.4940898Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-eval] PASSED [ 50%] 2022-05-18T05:09:38.5263258Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-train] PASSED [ 62%] 2022-05-18T05:09:38.5445445Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-eval] PASSED [ 75%] 2022-05-18T05:09:38.5598342Z distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[train] PASSED [ 87%] 2022-05-18T05:09:38.5754566Z distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[eval] PASSED [100%] 2022-05-18T05:09:38.5755573Z 2022-05-18T05:09:38.5755899Z ============================== 8 passed in 0.25s =============================== 2022-05-18T05:09:38.7220267Z Running distributed/pipeline/sync/skip/test_stash_pop ... [2022-05-18 05:09:38.721550] 2022-05-18T05:09:38.7220947Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_stash_pop.py', '-v'] ... [2022-05-18 05:09:38.721658] 2022-05-18T05:09:39.9697030Z ============================= test session starts ============================== 2022-05-18T05:09:39.9697943Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:39.9716921Z cachedir: .pytest_cache 2022-05-18T05:09:39.9717743Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:39.9718189Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:39.9718517Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:39.9718915Z plugins: hypothesis-4.53.2 2022-05-18T05:09:39.9881643Z collecting ...  2022-05-18T05:09:39.9882175Z collected 7 items  2022-05-18T05:09:39.9885846Z 2022-05-18T05:09:39.9930031Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash PASSED [ 14%] 2022-05-18T05:09:39.9950156Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop PASSED [ 28%] 2022-05-18T05:09:39.9971800Z distributed/pipeline/sync/skip/test_stash_pop.py::test_declare_but_not_use PASSED [ 42%] 2022-05-18T05:09:39.9989909Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_not_declared PASSED [ 57%] 2022-05-18T05:09:40.0009148Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_declared PASSED [ 71%] 2022-05-18T05:09:40.0027202Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_stashed PASSED [ 85%] 2022-05-18T05:09:40.0048662Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_none PASSED [100%] 2022-05-18T05:09:40.0049225Z 2022-05-18T05:09:40.0049545Z ============================== 7 passed in 0.04s =============================== 2022-05-18T05:09:40.1409332Z Running distributed/pipeline/sync/skip/test_verify_skippables ... [2022-05-18 05:09:40.140464] 2022-05-18T05:09:40.1410032Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_verify_skippables.py', '-v'] ... [2022-05-18 05:09:40.140568] 2022-05-18T05:09:41.3481301Z ============================= test session starts ============================== 2022-05-18T05:09:41.3481882Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:41.3501342Z cachedir: .pytest_cache 2022-05-18T05:09:41.3501971Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:41.3502424Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:41.3502737Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:41.3503127Z plugins: hypothesis-4.53.2 2022-05-18T05:09:41.3707024Z collecting ...  2022-05-18T05:09:41.3707434Z collected 9 items  2022-05-18T05:09:41.3711449Z 2022-05-18T05:09:41.3745458Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_matching PASSED [ 11%] 2022-05-18T05:09:41.3764517Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_not_pop PASSED [ 22%] 2022-05-18T05:09:41.3783717Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_unknown PASSED [ 33%] 2022-05-18T05:09:41.3803695Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_again PASSED [ 44%] 2022-05-18T05:09:41.3823783Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_again PASSED [ 55%] 2022-05-18T05:09:41.3843932Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_different_names PASSED [ 66%] 2022-05-18T05:09:41.3861585Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_same_name PASSED [ 77%] 2022-05-18T05:09:41.3884077Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop PASSED [ 88%] 2022-05-18T05:09:41.3908436Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop_but_isolated PASSED [100%] 2022-05-18T05:09:41.3909283Z 2022-05-18T05:09:41.3909585Z ============================== 9 passed in 0.04s =============================== 2022-05-18T05:09:41.5369523Z Running distributed/pipeline/sync/test_bugs ... [2022-05-18 05:09:41.536507] 2022-05-18T05:09:41.5370183Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_bugs.py', '-v'] ... [2022-05-18 05:09:41.536611] 2022-05-18T05:09:42.8144540Z ============================= test session starts ============================== 2022-05-18T05:09:42.8145098Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:42.8164746Z cachedir: .pytest_cache 2022-05-18T05:09:42.8165342Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:42.8165808Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:42.8166125Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:42.8166511Z plugins: hypothesis-4.53.2 2022-05-18T05:09:42.8416320Z collecting ...  2022-05-18T05:09:42.8416719Z collected 4 items  2022-05-18T05:09:42.8420729Z 2022-05-18T05:09:42.9232023Z distributed/pipeline/sync/test_bugs.py::test_python_autograd_function PASSED [ 25%] 2022-05-18T05:09:42.9407388Z distributed/pipeline/sync/test_bugs.py::test_exception_no_hang PASSED [ 50%] 2022-05-18T05:09:46.6178892Z distributed/pipeline/sync/test_bugs.py::test_tuple_wait PASSED [ 75%] 2022-05-18T05:09:46.7489940Z distributed/pipeline/sync/test_bugs.py::test_parallel_randoms PASSED [100%] 2022-05-18T05:09:46.7490968Z 2022-05-18T05:09:46.7491323Z ============================== 4 passed in 3.93s =============================== 2022-05-18T05:09:47.0119582Z Running distributed/pipeline/sync/test_copy ... [2022-05-18 05:09:47.011422] 2022-05-18T05:09:47.0120248Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_copy.py', '-v'] ... [2022-05-18 05:09:47.011529] 2022-05-18T05:09:48.2511356Z ============================= test session starts ============================== 2022-05-18T05:09:48.2511916Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:48.2531781Z cachedir: .pytest_cache 2022-05-18T05:09:48.2532376Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:48.2532804Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:48.2533146Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:48.2533524Z plugins: hypothesis-4.53.2 2022-05-18T05:09:48.2772021Z collecting ...  2022-05-18T05:09:48.2772430Z collected 5 items  2022-05-18T05:09:48.2776771Z 2022-05-18T05:09:48.2834539Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cpu PASSED [ 20%] 2022-05-18T05:09:49.5050910Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cuda PASSED [ 40%] 2022-05-18T05:09:49.9355652Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cpu PASSED [ 60%] 2022-05-18T05:09:50.2780519Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cuda PASSED [ 80%] 2022-05-18T05:09:50.2800146Z distributed/pipeline/sync/test_copy.py::test_wait_multiple_tensors PASSED [100%] 2022-05-18T05:09:50.2802000Z 2022-05-18T05:09:50.2802316Z ============================== 5 passed in 2.03s =============================== 2022-05-18T05:09:50.4976458Z Running distributed/pipeline/sync/test_dependency ... [2022-05-18 05:09:50.497135] 2022-05-18T05:09:50.4977147Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_dependency.py', '-v'] ... [2022-05-18 05:09:50.497241] 2022-05-18T05:09:51.7477882Z ============================= test session starts ============================== 2022-05-18T05:09:51.7478457Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:51.7497354Z cachedir: .pytest_cache 2022-05-18T05:09:51.7497943Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:51.7498380Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:51.7498717Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:51.7499082Z plugins: hypothesis-4.53.2 2022-05-18T05:09:51.7798790Z collecting ...  2022-05-18T05:09:51.7799210Z collected 6 items  2022-05-18T05:09:51.7802828Z 2022-05-18T05:09:52.9652034Z distributed/pipeline/sync/test_dependency.py::test_fork_join PASSED [ 16%] 2022-05-18T05:09:52.9664898Z distributed/pipeline/sync/test_dependency.py::test_fork_join_enable_grad PASSED [ 33%] 2022-05-18T05:09:52.9679608Z distributed/pipeline/sync/test_dependency.py::test_fork_join_no_grad PASSED [ 50%] 2022-05-18T05:09:52.9694912Z distributed/pipeline/sync/test_dependency.py::test_fork_leak PASSED [ 66%] 2022-05-18T05:09:52.9708365Z distributed/pipeline/sync/test_dependency.py::test_join_when_fork_not_requires_grad PASSED [ 83%] 2022-05-18T05:09:52.9725170Z distributed/pipeline/sync/test_dependency.py::test_join_when_fork_requires_grad PASSED [100%] 2022-05-18T05:09:52.9726676Z 2022-05-18T05:09:52.9727050Z ============================== 6 passed in 1.23s =============================== 2022-05-18T05:09:53.1799954Z Running distributed/pipeline/sync/test_microbatch ... [2022-05-18 05:09:53.179446] 2022-05-18T05:09:53.1801078Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_microbatch.py', '-v'] ... [2022-05-18 05:09:53.179553] 2022-05-18T05:09:54.4368871Z ============================= test session starts ============================== 2022-05-18T05:09:54.4369489Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:54.4389691Z cachedir: .pytest_cache 2022-05-18T05:09:54.4390303Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:54.4390746Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:54.4391066Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:54.4391450Z plugins: hypothesis-4.53.2 2022-05-18T05:09:54.4729542Z collecting ...  2022-05-18T05:09:54.4729965Z collected 10 items  2022-05-18T05:09:54.4734319Z 2022-05-18T05:09:54.4769410Z distributed/pipeline/sync/test_microbatch.py::test_batch_atomic PASSED [ 10%] 2022-05-18T05:09:54.4788353Z distributed/pipeline/sync/test_microbatch.py::test_batch_non_atomic PASSED [ 20%] 2022-05-18T05:09:54.4807005Z distributed/pipeline/sync/test_microbatch.py::test_batch_call PASSED [ 30%] 2022-05-18T05:09:54.4826218Z distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_index PASSED [ 40%] 2022-05-18T05:09:54.4843809Z distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_slice PASSED [ 50%] 2022-05-18T05:09:54.4864053Z distributed/pipeline/sync/test_microbatch.py::test_check PASSED [ 60%] 2022-05-18T05:09:54.4889825Z distributed/pipeline/sync/test_microbatch.py::test_gather_tensors PASSED [ 70%] 2022-05-18T05:09:54.4906794Z distributed/pipeline/sync/test_microbatch.py::test_gather_tuples PASSED [ 80%] 2022-05-18T05:09:54.4924644Z distributed/pipeline/sync/test_microbatch.py::test_scatter_tensor PASSED [ 90%] 2022-05-18T05:09:54.4945034Z distributed/pipeline/sync/test_microbatch.py::test_scatter_multiple_tensors PASSED [100%] 2022-05-18T05:09:54.4947244Z 2022-05-18T05:09:54.4947610Z ============================== 10 passed in 0.06s ============================== 2022-05-18T05:09:54.6384404Z Running distributed/pipeline/sync/test_pipe ... [2022-05-18 05:09:54.637899] 2022-05-18T05:09:54.6385150Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_pipe.py', '-v'] ... [2022-05-18 05:09:54.638004] 2022-05-18T05:09:55.8385321Z ============================= test session starts ============================== 2022-05-18T05:09:55.8385932Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:09:55.8405682Z cachedir: .pytest_cache 2022-05-18T05:09:55.8406286Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:09:55.8406742Z torch: 1.12.0a0+git3b23752 2022-05-18T05:09:55.8407090Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:09:55.8407821Z plugins: hypothesis-4.53.2 2022-05-18T05:09:55.9460635Z collecting ...  2022-05-18T05:09:55.9461065Z collected 56 items  2022-05-18T05:09:55.9465071Z 2022-05-18T05:09:55.9508264Z distributed/pipeline/sync/test_pipe.py::test_pipe_without_rpc PASSED [ 1%] 2022-05-18T05:09:56.0280436Z distributed/pipeline/sync/test_pipe.py::test_parameters PASSED [ 3%] 2022-05-18T05:09:56.0432551Z distributed/pipeline/sync/test_pipe.py::test_public_attrs PASSED [ 5%] 2022-05-18T05:09:56.0689014Z distributed/pipeline/sync/test_pipe.py::test_sequential_like PASSED [ 7%] 2022-05-18T05:09:56.0837367Z distributed/pipeline/sync/test_pipe.py::test_chunks_less_than_1 PASSED [ 8%] 2022-05-18T05:09:56.1014447Z distributed/pipeline/sync/test_pipe.py::test_batch_size_indivisible PASSED [ 10%] 2022-05-18T05:09:56.1178228Z distributed/pipeline/sync/test_pipe.py::test_batch_size_small PASSED [ 12%] 2022-05-18T05:09:56.1364182Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode PASSED [ 14%] 2022-05-18T05:09:56.1515413Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_invalid PASSED [ 16%] 2022-05-18T05:09:56.1677837Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_when_chunks_1 PASSED [ 17%] 2022-05-18T05:09:56.1951002Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_eval PASSED [ 19%] 2022-05-18T05:09:56.2128551Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_non_float_input PASSED [ 21%] 2022-05-18T05:09:56.2289867Z distributed/pipeline/sync/test_pipe.py::test_no_grad PASSED [ 23%] 2022-05-18T05:09:56.2445641Z distributed/pipeline/sync/test_pipe.py::test_exception PASSED [ 25%] 2022-05-18T05:09:56.4634741Z distributed/pipeline/sync/test_pipe.py::test_exception_early_stop_asap PASSED [ 26%] 2022-05-18T05:09:56.4818346Z distributed/pipeline/sync/test_pipe.py::test_nested_input PASSED [ 28%] 2022-05-18T05:09:56.4994218Z distributed/pipeline/sync/test_pipe.py::test_input_pair PASSED [ 30%] 2022-05-18T05:09:56.5161227Z distributed/pipeline/sync/test_pipe.py::test_multi_sequence_input PASSED [ 32%] 2022-05-18T05:09:56.5329843Z distributed/pipeline/sync/test_pipe.py::test_input_singleton PASSED [ 33%] 2022-05-18T05:09:56.5486573Z distributed/pipeline/sync/test_pipe.py::test_input_varargs PASSED [ 35%] 2022-05-18T05:09:56.5641279Z distributed/pipeline/sync/test_pipe.py::test_non_tensor PASSED [ 37%] 2022-05-18T05:09:56.5804898Z distributed/pipeline/sync/test_pipe.py::test_non_tensor_sequence PASSED [ 39%] 2022-05-18T05:09:56.6043735Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[never] PASSED [ 41%] 2022-05-18T05:09:56.6313359Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[always] PASSED [ 42%] 2022-05-18T05:09:56.6575708Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[except_last] PASSED [ 44%] 2022-05-18T05:09:56.6734455Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[never] PASSED [ 46%] 2022-05-18T05:09:56.6890415Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[always] PASSED [ 48%] 2022-05-18T05:09:56.7048299Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[except_last] PASSED [ 50%] 2022-05-18T05:09:56.7216713Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[never] PASSED [ 51%] 2022-05-18T05:09:56.7394425Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[always] PASSED [ 53%] 2022-05-18T05:09:56.7569610Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[except_last] PASSED [ 55%] 2022-05-18T05:09:56.7738451Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[never] PASSED [ 57%] 2022-05-18T05:09:56.7916535Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[always] PASSED [ 58%] 2022-05-18T05:09:56.8092901Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[except_last] PASSED [ 60%] 2022-05-18T05:09:56.8346897Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[never] PASSED [ 62%] 2022-05-18T05:09:56.8573582Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[always] PASSED [ 64%] 2022-05-18T05:09:56.8796431Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[except_last] PASSED [ 66%] 2022-05-18T05:09:56.9008780Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[never] PASSED [ 67%] 2022-05-18T05:09:56.9217108Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[always] PASSED [ 69%] 2022-05-18T05:09:56.9373154Z distributed/pipeline/sync/test_pipe.py::test_devices PASSED [ 71%] 2022-05-18T05:09:56.9529158Z distributed/pipeline/sync/test_pipe.py::test_partitions PASSED [ 73%] 2022-05-18T05:09:58.1582917Z distributed/pipeline/sync/test_pipe.py::test_merged_partitions PASSED [ 75%] 2022-05-18T05:09:58.1741729Z distributed/pipeline/sync/test_pipe.py::test_deny_moving PASSED [ 76%] 2022-05-18T05:09:58.1892211Z distributed/pipeline/sync/test_pipe.py::test_empty_module PASSED [ 78%] 2022-05-18T05:09:58.2044650Z distributed/pipeline/sync/test_pipe.py::test_named_children PASSED [ 80%] 2022-05-18T05:09:58.2192382Z distributed/pipeline/sync/test_pipe.py::test_verify_module_non_sequential PASSED [ 82%] 2022-05-18T05:09:58.2342635Z distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_children PASSED [ 83%] 2022-05-18T05:09:58.2498415Z distributed/pipeline/sync/test_pipe.py::test_verify_module_params_on_same_device PASSED [ 85%] 2022-05-18T05:09:59.5121774Z distributed/pipeline/sync/test_pipe.py::test_verify_nested_modules PASSED [ 87%] 2022-05-18T05:09:59.5278342Z distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_parameters_on_same_device PASSED [ 89%] 2022-05-18T05:09:59.8459486Z distributed/pipeline/sync/test_pipe.py::test_forward_lockstep PASSED [ 91%] 2022-05-18T05:09:59.8630938Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[never] PASSED [ 92%] 2022-05-18T05:09:59.8805967Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[always] PASSED [ 94%] 2022-05-18T05:09:59.8976558Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[except_last] PASSED [ 96%] 2022-05-18T05:09:59.9138550Z distributed/pipeline/sync/test_pipe.py::test_inputs_wrong_device PASSED [ 98%] 2022-05-18T05:09:59.9672439Z distributed/pipeline/sync/test_pipe.py::test_with_device_wrapper PASSED [100%] 2022-05-18T05:09:59.9672746Z 2022-05-18T05:09:59.9673000Z =============================== warnings summary =============================== 2022-05-18T05:09:59.9673407Z test/distributed/pipeline/sync/test_pipe.py::test_batch_size_indivisible 2022-05-18T05:09:59.9673874Z test/distributed/pipeline/sync/test_pipe.py::test_batch_size_small 2022-05-18T05:09:59.9674500Z /opt/conda/lib/python3.9/site-packages/_pytest/python.py:192: PytestRemovedIn8Warning: Passing None has been deprecated. 2022-05-18T05:09:59.9675317Z See https://docs.pytest.org/en/latest/how-to/capture-warnings.html#additional-use-cases-of-warnings-in-tests for alternatives in common use cases. 2022-05-18T05:09:59.9676057Z result = testfunction(**testargs) 2022-05-18T05:09:59.9676250Z 2022-05-18T05:09:59.9676555Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html 2022-05-18T05:09:59.9677134Z ======================== 56 passed, 2 warnings in 4.13s ======================== 2022-05-18T05:10:00.2488661Z Running distributed/pipeline/sync/test_stream ... [2022-05-18 05:10:00.248394] 2022-05-18T05:10:00.2489411Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_stream.py', '-v'] ... [2022-05-18 05:10:00.248504] 2022-05-18T05:10:01.5045241Z ============================= test session starts ============================== 2022-05-18T05:10:01.5045807Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:10:01.5065512Z cachedir: .pytest_cache 2022-05-18T05:10:01.5066123Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:10:01.5066572Z torch: 1.12.0a0+git3b23752 2022-05-18T05:10:01.5066918Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:10:01.5067305Z plugins: hypothesis-4.53.2 2022-05-18T05:10:01.5468392Z collecting ...  2022-05-18T05:10:01.5468796Z collected 19 items  2022-05-18T05:10:01.5473182Z 2022-05-18T05:10:01.5502540Z distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cpu PASSED [ 5%] 2022-05-18T05:10:02.7421595Z distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cuda PASSED [ 10%] 2022-05-18T05:10:02.7434163Z distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cpu PASSED [ 15%] 2022-05-18T05:10:02.7447038Z distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cuda PASSED [ 21%] 2022-05-18T05:10:02.7459288Z distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cpu PASSED [ 26%] 2022-05-18T05:10:02.7472106Z distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cuda PASSED [ 31%] 2022-05-18T05:10:02.7484229Z distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cpu PASSED [ 36%] 2022-05-18T05:10:02.7496654Z distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cuda PASSED [ 42%] 2022-05-18T05:10:02.7509089Z distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cpu PASSED [ 47%] 2022-05-18T05:10:02.7522140Z distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cuda PASSED [ 52%] 2022-05-18T05:10:02.7534312Z distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cpu PASSED [ 57%] 2022-05-18T05:10:02.7547094Z distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cuda PASSED [ 63%] 2022-05-18T05:10:02.7764770Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cpu PASSED [ 68%] 2022-05-18T05:10:03.2631267Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cuda PASSED [ 73%] 2022-05-18T05:10:03.2645730Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cpu PASSED [ 78%] 2022-05-18T05:10:03.7498049Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cuda PASSED [ 84%] 2022-05-18T05:10:03.7511393Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cpu PASSED [ 89%] 2022-05-18T05:10:04.2367310Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cuda PASSED [ 94%] 2022-05-18T05:10:04.2391781Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_shifted_view PASSED [100%] 2022-05-18T05:10:04.2392823Z 2022-05-18T05:10:04.2393141Z ============================== 19 passed in 2.74s ============================== 2022-05-18T05:10:04.7470858Z Running distributed/pipeline/sync/test_worker ... [2022-05-18 05:10:04.746601] 2022-05-18T05:10:04.7471551Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_worker.py', '-v'] ... [2022-05-18 05:10:04.746708] 2022-05-18T05:10:06.0133128Z ============================= test session starts ============================== 2022-05-18T05:10:06.0133688Z platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:10:06.0154380Z cachedir: .pytest_cache 2022-05-18T05:10:06.0155941Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:10:06.0156889Z torch: 1.12.0a0+git3b23752 2022-05-18T05:10:06.0157554Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:10:06.0158356Z plugins: hypothesis-4.53.2 2022-05-18T05:10:06.0370959Z collecting ...  2022-05-18T05:10:06.0371792Z collected 6 items  2022-05-18T05:10:06.0376643Z 2022-05-18T05:10:06.0415594Z distributed/pipeline/sync/test_worker.py::test_compute_multithreading PASSED [ 16%] 2022-05-18T05:10:06.0440583Z distributed/pipeline/sync/test_worker.py::test_compute_success PASSED [ 33%] 2022-05-18T05:10:06.0461195Z distributed/pipeline/sync/test_worker.py::test_compute_exception PASSED [ 50%] 2022-05-18T05:10:06.0492649Z distributed/pipeline/sync/test_worker.py::test_grad_mode[True] PASSED [ 66%] 2022-05-18T05:10:06.0515351Z distributed/pipeline/sync/test_worker.py::test_grad_mode[False] PASSED [ 83%] 2022-05-18T05:10:06.0543646Z distributed/pipeline/sync/test_worker.py::test_worker_per_device PASSED [100%] 2022-05-18T05:10:06.0544571Z 2022-05-18T05:10:06.0544971Z ============================== 6 passed in 0.04s =============================== 2022-05-18T05:10:06.2014513Z Running distributed/rpc/test_tensorpipe_agent ... [2022-05-18 05:10:06.200988] 2022-05-18T05:10:06.2015276Z Executing ['/opt/conda/bin/python', 'distributed/rpc/test_tensorpipe_agent.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:10:06.201093] 2022-05-18T05:10:07.0836828Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl_1q2uo8 2022-05-18T05:10:07.0838320Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl_1q2uo8/_remote_module_non_scriptable.py 2022-05-18T05:10:08.3721783Z 2022-05-18T05:10:08.3721960Z real 65m19.712s 2022-05-18T05:10:08.3722238Z user 128m58.629s 2022-05-18T05:10:08.3722517Z sys 96m36.972s 2022-05-18T05:10:08.3722750Z + assert_git_not_dirty 2022-05-18T05:10:08.3723322Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed != *rocm* ]] 2022-05-18T05:10:08.3723834Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed != *xla* ]] 2022-05-18T05:10:08.3726626Z ++ git status --porcelain 2022-05-18T05:10:09.1215472Z + git_status= 2022-05-18T05:10:09.1215913Z + [[ -n '' ]] 2022-05-18T05:10:09.1216388Z + [[ linux-bionic-cuda10.2-py3.9-gcc7-distributed == *cuda* ]] 2022-05-18T05:10:09.1216731Z + [[ 2 == 1 ]] 2022-05-18T05:10:09.1216942Z + [[ 2 == 1 ]] 2022-05-18T05:10:09.1217165Z + cleanup 2022-05-18T05:10:09.1217410Z + retcode=0 2022-05-18T05:10:09.1217617Z + set +x 2022-05-18T05:10:09.1217850Z EXITED_USER_LAND 2022-05-18T05:10:09.1297878Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master 2022-05-18T05:10:09.1298245Z with: 2022-05-18T05:10:09.1298788Z github-token: *** 2022-05-18T05:10:09.1299014Z env: 2022-05-18T05:10:09.1299234Z IN_CI: 1 2022-05-18T05:10:09.1299592Z IS_GHA: 1 2022-05-18T05:10:09.1299821Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:09.1300091Z GPU_FLAG: --gpus all 2022-05-18T05:10:09.1300338Z ##[endgroup] 2022-05-18T05:10:09.1332075Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a 2022-05-18T05:10:09.1332373Z with: 2022-05-18T05:10:09.1332600Z shell: bash 2022-05-18T05:10:09.1332849Z timeout_minutes: 10 2022-05-18T05:10:09.1333083Z max_attempts: 5 2022-05-18T05:10:09.1333336Z retry_wait_seconds: 30 2022-05-18T05:10:09.1333887Z command: set -x python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}" 2022-05-18T05:10:09.1334395Z polling_interval_seconds: 1 2022-05-18T05:10:09.1334652Z warning_on_retry: true 2022-05-18T05:10:09.1334917Z continue_on_error: false 2022-05-18T05:10:09.1335162Z env: 2022-05-18T05:10:09.1335360Z IN_CI: 1 2022-05-18T05:10:09.1335594Z IS_GHA: 1 2022-05-18T05:10:09.1335850Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:09.1336102Z GPU_FLAG: --gpus all 2022-05-18T05:10:09.1336495Z GITHUB_TOKEN: *** 2022-05-18T05:10:09.1336746Z ##[endgroup] 2022-05-18T05:10:09.1774276Z 2022-05-18T05:10:09.1848271Z + python3 -m pip install requests==2.26.0 2022-05-18T05:10:09.4826663Z Defaulting to user installation because normal site-packages is not writeable 2022-05-18T05:10:09.5046231Z Requirement already satisfied: requests==2.26.0 in /home/ec2-user/.local/lib/python3.7/site-packages (2.26.0) 2022-05-18T05:10:09.5260081Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (3.3) 2022-05-18T05:10:09.5276601Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (1.26.9) 2022-05-18T05:10:09.5498173Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2.0.12) 2022-05-18T05:10:09.5524983Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2021.10.8) 2022-05-18T05:10:09.6681363Z ++ python3 .github/scripts/get_workflow_job_id.py 2342799949 i-023c3009b9c09a97d 2022-05-18T05:10:11.1307261Z + GHA_WORKFLOW_JOB_ID=6482671459 2022-05-18T05:10:11.1308303Z + echo '::set-output name=job-id::6482671459' 2022-05-18T05:10:11.1841527Z Command completed after 1 attempt(s). 2022-05-18T05:10:11.1842038Z 2022-05-18T05:10:11.1992187Z Prepare all required actions 2022-05-18T05:10:11.1992628Z Getting action download info 2022-05-18T05:10:11.3449436Z Download action repository 'actions/upload-artifact@v2' (SHA:82c141cc518b40d92cc801eee768e7aafc9c2fa2) 2022-05-18T05:10:11.4710552Z ##[group]Run ./.github/actions/upload-test-artifacts 2022-05-18T05:10:11.4710833Z with: 2022-05-18T05:10:11.4711191Z file-suffix: test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459 2022-05-18T05:10:11.4711553Z env: 2022-05-18T05:10:11.4711769Z IN_CI: 1 2022-05-18T05:10:11.4711975Z IS_GHA: 1 2022-05-18T05:10:11.4712223Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:11.4712490Z GPU_FLAG: --gpus all 2022-05-18T05:10:11.4712719Z ##[endgroup] 2022-05-18T05:10:11.4739515Z ##[group]Run # Remove any previous test jsons if they exist 2022-05-18T05:10:11.4739906Z # Remove any previous test jsons if they exist 2022-05-18T05:10:11.4740229Z rm -f test-jsons-*.zip 2022-05-18T05:10:11.4740592Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' 2022-05-18T05:10:11.4753608Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:10:11.4753916Z env: 2022-05-18T05:10:11.4754138Z IN_CI: 1 2022-05-18T05:10:11.4754346Z IS_GHA: 1 2022-05-18T05:10:11.4754595Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:11.4754865Z GPU_FLAG: --gpus all 2022-05-18T05:10:11.4755220Z FILE_SUFFIX: test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459 2022-05-18T05:10:11.4755706Z ##[endgroup] 2022-05-18T05:10:11.4872551Z adding: test/allowlist_for_publicAPI.json (deflated 82%) 2022-05-18T05:10:11.4907466Z adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%) 2022-05-18T05:10:11.4908635Z adding: test/.pytorch-slow-tests.json (deflated 71%) 2022-05-18T05:10:11.4912760Z adding: test/.pytorch-disabled-tests.json (deflated 83%) 2022-05-18T05:10:11.4934327Z ##[group]Run # Remove any previous test reports if they exist 2022-05-18T05:10:11.4934723Z # Remove any previous test reports if they exist 2022-05-18T05:10:11.4935051Z rm -f test-reports-*.zip 2022-05-18T05:10:11.4935375Z zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' 2022-05-18T05:10:11.4947460Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:10:11.4947760Z env: 2022-05-18T05:10:11.4947964Z IN_CI: 1 2022-05-18T05:10:11.4948189Z IS_GHA: 1 2022-05-18T05:10:11.4948436Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:11.4948691Z GPU_FLAG: --gpus all 2022-05-18T05:10:11.4949058Z FILE_SUFFIX: test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459 2022-05-18T05:10:11.4949410Z ##[endgroup] 2022-05-18T05:10:11.5064141Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDdpComparisonTest-20220518040457.xml (deflated 41%) 2022-05-18T05:10:11.5065112Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518040502.xml (deflated 41%) 2022-05-18T05:10:11.5066027Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518040509.xml (deflated 41%) 2022-05-18T05:10:11.5066920Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518040515.xml (deflated 41%) 2022-05-18T05:10:11.5067786Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040522.xml (deflated 41%) 2022-05-18T05:10:11.5068680Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040528.xml (deflated 41%) 2022-05-18T05:10:11.5069581Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040535.xml (deflated 41%) 2022-05-18T05:10:11.5070621Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518040540.xml (deflated 41%) 2022-05-18T05:10:11.5071510Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRpcTest-20220518040546.xml (deflated 40%) 2022-05-18T05:10:11.5072350Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040555.xml (deflated 40%) 2022-05-18T05:10:11.5073217Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040559.xml (deflated 40%) 2022-05-18T05:10:11.5074127Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040604.xml (deflated 40%) 2022-05-18T05:10:11.5074997Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040609.xml (deflated 40%) 2022-05-18T05:10:11.5075849Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040614.xml (deflated 40%) 2022-05-18T05:10:11.5076727Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040618.xml (deflated 40%) 2022-05-18T05:10:11.5077588Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040623.xml (deflated 40%) 2022-05-18T05:10:11.5078551Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518040628.xml (deflated 40%) 2022-05-18T05:10:11.5079467Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040632.xml (deflated 42%) 2022-05-18T05:10:11.5080416Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040641.xml (deflated 42%) 2022-05-18T05:10:11.5081374Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040653.xml (deflated 42%) 2022-05-18T05:10:11.5082328Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040705.xml (deflated 43%) 2022-05-18T05:10:11.5083286Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040715.xml (deflated 43%) 2022-05-18T05:10:11.5084223Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040725.xml (deflated 43%) 2022-05-18T05:10:11.5085174Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040735.xml (deflated 43%) 2022-05-18T05:10:11.5086123Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040745.xml (deflated 43%) 2022-05-18T05:10:11.5087069Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040756.xml (deflated 43%) 2022-05-18T05:10:11.5088018Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040806.xml (deflated 43%) 2022-05-18T05:10:11.5088970Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040811.xml (deflated 43%) 2022-05-18T05:10:11.5089917Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040815.xml (deflated 43%) 2022-05-18T05:10:11.5090928Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040820.xml (deflated 43%) 2022-05-18T05:10:11.5091902Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040825.xml (deflated 42%) 2022-05-18T05:10:11.5092849Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040831.xml (deflated 43%) 2022-05-18T05:10:11.5093780Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040837.xml (deflated 42%) 2022-05-18T05:10:11.5094731Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040849.xml (deflated 43%) 2022-05-18T05:10:11.5095678Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040901.xml (deflated 43%) 2022-05-18T05:10:11.5096630Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040920.xml (deflated 43%) 2022-05-18T05:10:11.5097550Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040933.xml (deflated 42%) 2022-05-18T05:10:11.5098636Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040945.xml (deflated 42%) 2022-05-18T05:10:11.5099652Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040950.xml (deflated 42%) 2022-05-18T05:10:11.5100665Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518040958.xml (deflated 42%) 2022-05-18T05:10:11.5101682Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041006.xml (deflated 42%) 2022-05-18T05:10:11.5102676Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041014.xml (deflated 43%) 2022-05-18T05:10:11.5103867Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041024.xml (deflated 42%) 2022-05-18T05:10:11.5104900Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041034.xml (deflated 42%) 2022-05-18T05:10:11.5105911Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041044.xml (deflated 43%) 2022-05-18T05:10:11.5106908Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041054.xml (deflated 42%) 2022-05-18T05:10:11.5107925Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041104.xml (deflated 42%) 2022-05-18T05:10:11.5108934Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041114.xml (deflated 42%) 2022-05-18T05:10:11.5109946Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041124.xml (deflated 42%) 2022-05-18T05:10:11.5110943Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041135.xml (deflated 43%) 2022-05-18T05:10:11.5112035Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041145.xml (deflated 42%) 2022-05-18T05:10:11.5113065Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041155.xml (deflated 43%) 2022-05-18T05:10:11.5114071Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041205.xml (deflated 42%) 2022-05-18T05:10:11.5115086Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041215.xml (deflated 42%) 2022-05-18T05:10:11.5116146Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041225.xml (deflated 42%) 2022-05-18T05:10:11.5117145Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041236.xml (deflated 42%) 2022-05-18T05:10:11.5118159Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041246.xml (deflated 42%) 2022-05-18T05:10:11.5119169Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041256.xml (deflated 42%) 2022-05-18T05:10:11.5120183Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041306.xml (deflated 42%) 2022-05-18T05:10:11.5121260Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041314.xml (deflated 43%) 2022-05-18T05:10:11.5122272Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041324.xml (deflated 42%) 2022-05-18T05:10:11.5123289Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041332.xml (deflated 42%) 2022-05-18T05:10:11.5124297Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041340.xml (deflated 43%) 2022-05-18T05:10:11.5125308Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041350.xml (deflated 42%) 2022-05-18T05:10:11.5126309Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041400.xml (deflated 43%) 2022-05-18T05:10:11.5127321Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041405.xml (deflated 43%) 2022-05-18T05:10:11.5128332Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041410.xml (deflated 43%) 2022-05-18T05:10:11.5129340Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041415.xml (deflated 43%) 2022-05-18T05:10:11.5130333Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041419.xml (deflated 42%) 2022-05-18T05:10:11.5131346Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041426.xml (deflated 42%) 2022-05-18T05:10:11.5132354Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041433.xml (deflated 42%) 2022-05-18T05:10:11.5133361Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041440.xml (deflated 42%) 2022-05-18T05:10:11.5134419Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041447.xml (deflated 42%) 2022-05-18T05:10:11.5135420Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041454.xml (deflated 42%) 2022-05-18T05:10:11.5136436Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041500.xml (deflated 42%) 2022-05-18T05:10:11.5137453Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041507.xml (deflated 42%) 2022-05-18T05:10:11.5138459Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041517.xml (deflated 42%) 2022-05-18T05:10:11.5139454Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041528.xml (deflated 43%) 2022-05-18T05:10:11.5140470Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041532.xml (deflated 43%) 2022-05-18T05:10:11.5141481Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041543.xml (deflated 43%) 2022-05-18T05:10:11.5142555Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041547.xml (deflated 43%) 2022-05-18T05:10:11.5143563Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041552.xml (deflated 42%) 2022-05-18T05:10:11.5144729Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041557.xml (deflated 43%) 2022-05-18T05:10:11.5145695Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041605.xml (deflated 42%) 2022-05-18T05:10:11.5146643Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041610.xml (deflated 42%) 2022-05-18T05:10:11.5147604Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041615.xml (deflated 43%) 2022-05-18T05:10:11.5148537Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041623.xml (deflated 41%) 2022-05-18T05:10:11.5149497Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041633.xml (deflated 41%) 2022-05-18T05:10:11.5150454Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041644.xml (deflated 41%) 2022-05-18T05:10:11.5151407Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041654.xml (deflated 41%) 2022-05-18T05:10:11.5152339Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041704.xml (deflated 42%) 2022-05-18T05:10:11.5153299Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041721.xml (deflated 42%) 2022-05-18T05:10:11.5154241Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041741.xml (deflated 42%) 2022-05-18T05:10:11.5155264Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041758.xml (deflated 42%) 2022-05-18T05:10:11.5156226Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041818.xml (deflated 42%) 2022-05-18T05:10:11.5157156Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041835.xml (deflated 41%) 2022-05-18T05:10:11.5158105Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041851.xml (deflated 42%) 2022-05-18T05:10:11.5159052Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041908.xml (deflated 42%) 2022-05-18T05:10:11.5159998Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041924.xml (deflated 41%) 2022-05-18T05:10:11.5160921Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041940.xml (deflated 42%) 2022-05-18T05:10:11.5161868Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518041958.xml (deflated 41%) 2022-05-18T05:10:11.5162896Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042017.xml (deflated 41%) 2022-05-18T05:10:11.5163842Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042035.xml (deflated 42%) 2022-05-18T05:10:11.5164781Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042055.xml (deflated 43%) 2022-05-18T05:10:11.5165705Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042103.xml (deflated 43%) 2022-05-18T05:10:11.5166692Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518042113.xml (deflated 44%) 2022-05-18T05:10:11.5167694Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518042117.xml (deflated 44%) 2022-05-18T05:10:11.5168688Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518042122.xml (deflated 44%) 2022-05-18T05:10:11.5169503Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20220518042127.xml (deflated 79%) 2022-05-18T05:10:11.5170212Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20220518042127.xml (deflated 54%) 2022-05-18T05:10:11.5170995Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20220518042127.xml (deflated 55%) 2022-05-18T05:10:11.5171744Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20220518042127.xml (deflated 95%) 2022-05-18T05:10:11.5172450Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043332.xml (deflated 38%) 2022-05-18T05:10:11.5173109Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043338.xml (deflated 38%) 2022-05-18T05:10:11.5173776Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043343.xml (deflated 39%) 2022-05-18T05:10:11.5174473Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043347.xml (deflated 38%) 2022-05-18T05:10:11.5175225Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043352.xml (deflated 38%) 2022-05-18T05:10:11.5175879Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043356.xml (deflated 39%) 2022-05-18T05:10:11.5176537Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043400.xml (deflated 39%) 2022-05-18T05:10:11.5177190Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043404.xml (deflated 39%) 2022-05-18T05:10:11.5177847Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043408.xml (deflated 37%) 2022-05-18T05:10:11.5178488Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043413.xml (deflated 37%) 2022-05-18T05:10:11.5179149Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043419.xml (deflated 37%) 2022-05-18T05:10:11.5179803Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043424.xml (deflated 38%) 2022-05-18T05:10:11.5180455Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043430.xml (deflated 38%) 2022-05-18T05:10:11.5181092Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043435.xml (deflated 39%) 2022-05-18T05:10:11.5181753Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043439.xml (deflated 38%) 2022-05-18T05:10:11.5182477Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518043444.xml (deflated 38%) 2022-05-18T05:10:11.5183215Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043450.xml (deflated 41%) 2022-05-18T05:10:11.5184202Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043455.xml (deflated 41%) 2022-05-18T05:10:11.5185002Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043501.xml (deflated 41%) 2022-05-18T05:10:11.5185801Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043507.xml (deflated 41%) 2022-05-18T05:10:11.5186598Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043512.xml (deflated 41%) 2022-05-18T05:10:11.5187383Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043513.xml (deflated 41%) 2022-05-18T05:10:11.5188176Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043514.xml (deflated 41%) 2022-05-18T05:10:11.5188971Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043520.xml (deflated 41%) 2022-05-18T05:10:11.5189769Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043525.xml (deflated 44%) 2022-05-18T05:10:11.5190536Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043531.xml (deflated 45%) 2022-05-18T05:10:11.5191318Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043536.xml (deflated 43%) 2022-05-18T05:10:11.5192111Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043542.xml (deflated 43%) 2022-05-18T05:10:11.5192900Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043548.xml (deflated 45%) 2022-05-18T05:10:11.5193678Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043553.xml (deflated 45%) 2022-05-18T05:10:11.5194538Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043559.xml (deflated 46%) 2022-05-18T05:10:11.5195332Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043605.xml (deflated 46%) 2022-05-18T05:10:11.5196121Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043610.xml (deflated 44%) 2022-05-18T05:10:11.5196894Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043616.xml (deflated 45%) 2022-05-18T05:10:11.5197686Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043621.xml (deflated 45%) 2022-05-18T05:10:11.5198476Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043627.xml (deflated 44%) 2022-05-18T05:10:11.5199264Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043633.xml (deflated 43%) 2022-05-18T05:10:11.5200035Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043638.xml (deflated 42%) 2022-05-18T05:10:11.5200819Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043644.xml (deflated 41%) 2022-05-18T05:10:11.5201677Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043649.xml (deflated 42%) 2022-05-18T05:10:11.5202466Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043655.xml (deflated 44%) 2022-05-18T05:10:11.5203232Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043700.xml (deflated 44%) 2022-05-18T05:10:11.5204028Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043705.xml (deflated 42%) 2022-05-18T05:10:11.5204809Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043709.xml (deflated 41%) 2022-05-18T05:10:11.5205596Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043715.xml (deflated 41%) 2022-05-18T05:10:11.5206391Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043719.xml (deflated 41%) 2022-05-18T05:10:11.5207163Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043724.xml (deflated 41%) 2022-05-18T05:10:11.5207949Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043730.xml (deflated 42%) 2022-05-18T05:10:11.5208739Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043735.xml (deflated 41%) 2022-05-18T05:10:11.5209527Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043741.xml (deflated 41%) 2022-05-18T05:10:11.5210299Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043747.xml (deflated 41%) 2022-05-18T05:10:11.5211080Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043752.xml (deflated 41%) 2022-05-18T05:10:11.5211876Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043758.xml (deflated 41%) 2022-05-18T05:10:11.5212660Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043804.xml (deflated 41%) 2022-05-18T05:10:11.5213477Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043809.xml (deflated 42%) 2022-05-18T05:10:11.5214264Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043815.xml (deflated 41%) 2022-05-18T05:10:11.5215048Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043820.xml (deflated 42%) 2022-05-18T05:10:11.5215836Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043826.xml (deflated 41%) 2022-05-18T05:10:11.5216605Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043831.xml (deflated 41%) 2022-05-18T05:10:11.5217386Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043838.xml (deflated 42%) 2022-05-18T05:10:11.5218170Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043842.xml (deflated 41%) 2022-05-18T05:10:11.5218953Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043845.xml (deflated 41%) 2022-05-18T05:10:11.5219726Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043851.xml (deflated 42%) 2022-05-18T05:10:11.5220510Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043857.xml (deflated 41%) 2022-05-18T05:10:11.5221360Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043902.xml (deflated 41%) 2022-05-18T05:10:11.5222155Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043907.xml (deflated 42%) 2022-05-18T05:10:11.5222931Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043911.xml (deflated 42%) 2022-05-18T05:10:11.5223948Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043915.xml (deflated 41%) 2022-05-18T05:10:11.5224740Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043921.xml (deflated 42%) 2022-05-18T05:10:11.5225528Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043925.xml (deflated 42%) 2022-05-18T05:10:11.5226305Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043930.xml (deflated 42%) 2022-05-18T05:10:11.5227098Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043936.xml (deflated 42%) 2022-05-18T05:10:11.5227878Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518043957.xml (deflated 44%) 2022-05-18T05:10:11.5228672Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044002.xml (deflated 42%) 2022-05-18T05:10:11.5229448Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044008.xml (deflated 42%) 2022-05-18T05:10:11.5230239Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044012.xml (deflated 41%) 2022-05-18T05:10:11.5231031Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044017.xml (deflated 41%) 2022-05-18T05:10:11.5231817Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044023.xml (deflated 42%) 2022-05-18T05:10:11.5232591Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518044029.xml (deflated 41%) 2022-05-18T05:10:11.5233432Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044035.xml (deflated 41%) 2022-05-18T05:10:11.5234204Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044039.xml (deflated 41%) 2022-05-18T05:10:11.5234959Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044043.xml (deflated 42%) 2022-05-18T05:10:11.5235695Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044044.xml (deflated 42%) 2022-05-18T05:10:11.5236441Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044048.xml (deflated 40%) 2022-05-18T05:10:11.5237189Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044051.xml (deflated 41%) 2022-05-18T05:10:11.5237940Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044055.xml (deflated 40%) 2022-05-18T05:10:11.5258549Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044059.xml (deflated 42%) 2022-05-18T05:10:11.5259391Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518044100.xml (deflated 41%) 2022-05-18T05:10:11.5260229Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLNoGPUTest-20220518044104.xml (deflated 41%) 2022-05-18T05:10:11.5261208Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044105.xml (deflated 39%) 2022-05-18T05:10:11.5262014Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044111.xml (deflated 39%) 2022-05-18T05:10:11.5262824Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044116.xml (deflated 39%) 2022-05-18T05:10:11.5263880Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044121.xml (deflated 39%) 2022-05-18T05:10:11.5264687Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044127.xml (deflated 39%) 2022-05-18T05:10:11.5265492Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044132.xml (deflated 39%) 2022-05-18T05:10:11.5266296Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044137.xml (deflated 39%) 2022-05-18T05:10:11.5267093Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044144.xml (deflated 39%) 2022-05-18T05:10:11.5267875Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044149.xml (deflated 39%) 2022-05-18T05:10:11.5268679Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044154.xml (deflated 39%) 2022-05-18T05:10:11.5269484Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044203.xml (deflated 39%) 2022-05-18T05:10:11.5270279Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044208.xml (deflated 39%) 2022-05-18T05:10:11.5271058Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044213.xml (deflated 39%) 2022-05-18T05:10:11.5271861Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044219.xml (deflated 39%) 2022-05-18T05:10:11.5272757Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044224.xml (deflated 39%) 2022-05-18T05:10:11.5273650Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044229.xml (deflated 39%) 2022-05-18T05:10:11.5274497Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518044234.xml (deflated 39%) 2022-05-18T05:10:11.5275472Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-RendezvousEnvTest-20220518044243.xml (deflated 40%) 2022-05-18T05:10:11.5276240Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-TimeoutTest-20220518044245.xml (deflated 40%) 2022-05-18T05:10:11.5277013Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044252.xml (deflated 38%) 2022-05-18T05:10:11.5277734Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044256.xml (deflated 38%) 2022-05-18T05:10:11.5278448Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044301.xml (deflated 38%) 2022-05-18T05:10:11.5279167Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044305.xml (deflated 38%) 2022-05-18T05:10:11.5279866Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044311.xml (deflated 38%) 2022-05-18T05:10:11.5280576Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044316.xml (deflated 39%) 2022-05-18T05:10:11.5281281Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044320.xml (deflated 38%) 2022-05-18T05:10:11.5282097Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518044324.xml (deflated 38%) 2022-05-18T05:10:11.5282874Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044328.xml (deflated 44%) 2022-05-18T05:10:11.5283728Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044333.xml (deflated 45%) 2022-05-18T05:10:11.5284581Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044339.xml (deflated 42%) 2022-05-18T05:10:11.5285425Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044345.xml (deflated 43%) 2022-05-18T05:10:11.5286261Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044351.xml (deflated 45%) 2022-05-18T05:10:11.5287115Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044356.xml (deflated 45%) 2022-05-18T05:10:11.5287965Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044402.xml (deflated 46%) 2022-05-18T05:10:11.5288811Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044408.xml (deflated 46%) 2022-05-18T05:10:11.5289646Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044413.xml (deflated 44%) 2022-05-18T05:10:11.5290494Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044419.xml (deflated 46%) 2022-05-18T05:10:11.5291340Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044425.xml (deflated 46%) 2022-05-18T05:10:11.5292195Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044430.xml (deflated 44%) 2022-05-18T05:10:11.5293025Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044436.xml (deflated 44%) 2022-05-18T05:10:11.5293869Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044441.xml (deflated 44%) 2022-05-18T05:10:11.5294780Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044445.xml (deflated 44%) 2022-05-18T05:10:11.5295638Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044451.xml (deflated 44%) 2022-05-18T05:10:11.5296473Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044455.xml (deflated 44%) 2022-05-18T05:10:11.5297324Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044459.xml (deflated 45%) 2022-05-18T05:10:11.5298173Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044503.xml (deflated 45%) 2022-05-18T05:10:11.5299024Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044507.xml (deflated 50%) 2022-05-18T05:10:11.5299857Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044512.xml (deflated 42%) 2022-05-18T05:10:11.5300707Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044518.xml (deflated 42%) 2022-05-18T05:10:11.5301551Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044523.xml (deflated 41%) 2022-05-18T05:10:11.5302399Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044529.xml (deflated 41%) 2022-05-18T05:10:11.5303298Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044534.xml (deflated 42%) 2022-05-18T05:10:11.5304336Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044540.xml (deflated 42%) 2022-05-18T05:10:11.5305186Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044544.xml (deflated 42%) 2022-05-18T05:10:11.5306032Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044548.xml (deflated 41%) 2022-05-18T05:10:11.5306860Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044552.xml (deflated 41%) 2022-05-18T05:10:11.5307708Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044556.xml (deflated 44%) 2022-05-18T05:10:11.5308560Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044600.xml (deflated 45%) 2022-05-18T05:10:11.5309404Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044604.xml (deflated 41%) 2022-05-18T05:10:11.5310244Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044608.xml (deflated 41%) 2022-05-18T05:10:11.5311098Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044613.xml (deflated 41%) 2022-05-18T05:10:11.5311946Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044617.xml (deflated 41%) 2022-05-18T05:10:11.5312792Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044621.xml (deflated 41%) 2022-05-18T05:10:11.5313644Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518044627.xml (deflated 40%) 2022-05-18T05:10:11.5314455Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044633.xml (deflated 39%) 2022-05-18T05:10:11.5315255Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044637.xml (deflated 39%) 2022-05-18T05:10:11.5316130Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044643.xml (deflated 39%) 2022-05-18T05:10:11.5316945Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044647.xml (deflated 39%) 2022-05-18T05:10:11.5317730Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044652.xml (deflated 39%) 2022-05-18T05:10:11.5318531Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044656.xml (deflated 39%) 2022-05-18T05:10:11.5319327Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044700.xml (deflated 39%) 2022-05-18T05:10:11.5320126Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044704.xml (deflated 40%) 2022-05-18T05:10:11.5320905Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044712.xml (deflated 40%) 2022-05-18T05:10:11.5321707Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044716.xml (deflated 40%) 2022-05-18T05:10:11.5322504Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044722.xml (deflated 39%) 2022-05-18T05:10:11.5323300Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044727.xml (deflated 39%) 2022-05-18T05:10:11.5324165Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044732.xml (deflated 39%) 2022-05-18T05:10:11.5324962Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044736.xml (deflated 39%) 2022-05-18T05:10:11.5325762Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044740.xml (deflated 40%) 2022-05-18T05:10:11.5326552Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044744.xml (deflated 39%) 2022-05-18T05:10:11.5327333Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044748.xml (deflated 39%) 2022-05-18T05:10:11.5328125Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044754.xml (deflated 40%) 2022-05-18T05:10:11.5328931Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044758.xml (deflated 40%) 2022-05-18T05:10:11.5329728Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044803.xml (deflated 40%) 2022-05-18T05:10:11.5330509Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044809.xml (deflated 39%) 2022-05-18T05:10:11.5331307Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044813.xml (deflated 40%) 2022-05-18T05:10:11.5332101Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044817.xml (deflated 39%) 2022-05-18T05:10:11.5332897Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044823.xml (deflated 40%) 2022-05-18T05:10:11.5333683Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044827.xml (deflated 40%) 2022-05-18T05:10:11.5334478Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044831.xml (deflated 39%) 2022-05-18T05:10:11.5335277Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044837.xml (deflated 40%) 2022-05-18T05:10:11.5336130Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044841.xml (deflated 39%) 2022-05-18T05:10:11.5336924Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044845.xml (deflated 39%) 2022-05-18T05:10:11.5337717Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044851.xml (deflated 39%) 2022-05-18T05:10:11.5338511Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044855.xml (deflated 39%) 2022-05-18T05:10:11.5339306Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044859.xml (deflated 39%) 2022-05-18T05:10:11.5340088Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044904.xml (deflated 39%) 2022-05-18T05:10:11.5340881Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044911.xml (deflated 39%) 2022-05-18T05:10:11.5341680Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044916.xml (deflated 39%) 2022-05-18T05:10:11.5342480Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044920.xml (deflated 39%) 2022-05-18T05:10:11.5343260Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044925.xml (deflated 39%) 2022-05-18T05:10:11.5344308Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044930.xml (deflated 39%) 2022-05-18T05:10:11.5345109Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044934.xml (deflated 39%) 2022-05-18T05:10:11.5345902Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044941.xml (deflated 39%) 2022-05-18T05:10:11.5346684Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044945.xml (deflated 39%) 2022-05-18T05:10:11.5347485Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044949.xml (deflated 40%) 2022-05-18T05:10:11.5348278Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044953.xml (deflated 39%) 2022-05-18T05:10:11.5349074Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518044959.xml (deflated 39%) 2022-05-18T05:10:11.5349861Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045003.xml (deflated 40%) 2022-05-18T05:10:11.5350654Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045008.xml (deflated 41%) 2022-05-18T05:10:11.5351454Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045009.xml (deflated 40%) 2022-05-18T05:10:11.5352246Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045013.xml (deflated 41%) 2022-05-18T05:10:11.5353028Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045014.xml (deflated 40%) 2022-05-18T05:10:11.5353824Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518045020.xml (deflated 40%) 2022-05-18T05:10:11.5354592Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045024.xml (deflated 39%) 2022-05-18T05:10:11.5355329Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045025.xml (deflated 39%) 2022-05-18T05:10:11.5356041Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045026.xml (deflated 39%) 2022-05-18T05:10:11.5356842Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045027.xml (deflated 39%) 2022-05-18T05:10:11.5357584Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045028.xml (deflated 38%) 2022-05-18T05:10:11.5358312Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518045029.xml (deflated 39%) 2022-05-18T05:10:11.5359053Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20220518045030.xml (deflated 39%) 2022-05-18T05:10:11.5359808Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20220518045033.xml (deflated 41%) 2022-05-18T05:10:11.5360657Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionSharded-20220518045037.xml (deflated 94%) 2022-05-18T05:10:11.5361612Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionUnsharded-20220518045037.xml (deflated 57%) 2022-05-18T05:10:11.5362511Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParams-20220518045552.xml (deflated 93%) 2022-05-18T05:10:11.5363450Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParamsNoShard-20220518045552.xml (deflated 84%) 2022-05-18T05:10:11.5364443Z adding: test/test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20220518045908.xml (deflated 90%) 2022-05-18T05:10:11.5365561Z adding: test/test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20220518045908.xml (deflated 73%) 2022-05-18T05:10:11.5366529Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestCreateTensorFromParams-20220518050103.xml (deflated 43%) 2022-05-18T05:10:11.5367470Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorMetadata-20220518050103.xml (deflated 44%) 2022-05-18T05:10:11.5368363Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestLocalTensor-20220518050103.xml (deflated 59%) 2022-05-18T05:10:11.5369227Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestModuleHookApi-20220518050103.xml (deflated 58%) 2022-05-18T05:10:11.5370108Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardParameter-20220518050103.xml (deflated 61%) 2022-05-18T05:10:11.5370955Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardTensor-20220518050103.xml (deflated 60%) 2022-05-18T05:10:11.5371854Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorChunked-20220518050103.xml (deflated 88%) 2022-05-18T05:10:11.5372788Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorCustomOps-20220518050103.xml (deflated 69%) 2022-05-18T05:10:11.5373738Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorEnumerable-20220518050103.xml (deflated 84%) 2022-05-18T05:10:11.5374701Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalShards-20220518050103.xml (deflated 85%) 2022-05-18T05:10:11.5375719Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalTensor-20220518050103.xml (deflated 61%) 2022-05-18T05:10:11.5376641Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050218.xml (deflated 41%) 2022-05-18T05:10:11.5377570Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050222.xml (deflated 41%) 2022-05-18T05:10:11.5378441Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050226.xml (deflated 41%) 2022-05-18T05:10:11.5379311Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050230.xml (deflated 40%) 2022-05-18T05:10:11.5380189Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050234.xml (deflated 40%) 2022-05-18T05:10:11.5381058Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050238.xml (deflated 40%) 2022-05-18T05:10:11.5381910Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050242.xml (deflated 41%) 2022-05-18T05:10:11.5382777Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050246.xml (deflated 41%) 2022-05-18T05:10:11.5383917Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupGlooWrapperTest-20220518050250.xml (deflated 40%) 2022-05-18T05:10:11.5384804Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050254.xml (deflated 40%) 2022-05-18T05:10:11.5385752Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050258.xml (deflated 39%) 2022-05-18T05:10:11.5386622Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050304.xml (deflated 39%) 2022-05-18T05:10:11.5387496Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050309.xml (deflated 39%) 2022-05-18T05:10:11.5388357Z adding: test/test-reports/python-unittest/distributed.test_pg_wrapper/TEST-ProcessGroupNCCLWrapperTest-20220518050314.xml (deflated 39%) 2022-05-18T05:10:11.5389175Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_grad_acc/TEST-TestGradAcc-20220518050320.xml (deflated 93%) 2022-05-18T05:10:11.5390058Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518050400.xml (deflated 44%) 2022-05-18T05:10:11.5391059Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518050403.xml (deflated 43%) 2022-05-18T05:10:11.5392046Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518050406.xml (deflated 43%) 2022-05-18T05:10:11.5392984Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050410.xml (deflated 42%) 2022-05-18T05:10:11.5393867Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050417.xml (deflated 42%) 2022-05-18T05:10:11.5394766Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050424.xml (deflated 42%) 2022-05-18T05:10:11.5395664Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050430.xml (deflated 41%) 2022-05-18T05:10:11.5396565Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050437.xml (deflated 42%) 2022-05-18T05:10:11.5397444Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050444.xml (deflated 41%) 2022-05-18T05:10:11.5398405Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050450.xml (deflated 42%) 2022-05-18T05:10:11.5399318Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518050457.xml (deflated 42%) 2022-05-18T05:10:11.5400160Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_comm/TEST-TestCommunication-20220518050503.xml (deflated 91%) 2022-05-18T05:10:11.5400988Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardGradScaler-20220518050531.xml (deflated 63%) 2022-05-18T05:10:11.5401936Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_sharded_grad_scaler/TEST-TestShardedGradScalerParityWithDDP-20220518050531.xml (deflated 83%) 2022-05-18T05:10:11.5402798Z adding: test/test-reports/python-unittest/distributed.algorithms.test_join/TEST-TestJoin-20220518050557.xml (deflated 79%) 2022-05-18T05:10:11.5403565Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20220518050623.xml (deflated 72%) 2022-05-18T05:10:11.5404421Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20220518050647.xml (deflated 75%) 2022-05-18T05:10:11.5405312Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestStorageKeys-20220518050647.xml (deflated 40%) 2022-05-18T05:10:11.5406217Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_matrix_ops/TEST-TestShardedTensorMatrixOps-20220518050706.xml (deflated 86%) 2022-05-18T05:10:11.5407141Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_memory/TEST-TestFSDPMemory-20220518050722.xml (deflated 55%) 2022-05-18T05:10:11.5408042Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedReshardOnLoad-20220518050735.xml (deflated 63%) 2022-05-18T05:10:11.5409029Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoad-20220518050735.xml (deflated 42%) 2022-05-18T05:10:11.5410123Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoadWithSharedTensor-20220518050735.xml (deflated 45%) 2022-05-18T05:10:11.5411128Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_example/TEST-LocalTimerExample-20220518050749.xml (deflated 54%) 2022-05-18T05:10:11.5411990Z adding: test/test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorOps-20220518050758.xml (deflated 67%) 2022-05-18T05:10:11.5412832Z adding: test/test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorReshard-20220518050758.xml (deflated 60%) 2022-05-18T05:10:11.5413637Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_input/TEST-TestInput-20220518050807.xml (deflated 57%) 2022-05-18T05:10:11.5414450Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_tensor_ops/TEST-TestTensorOps-20220518050815.xml (deflated 72%) 2022-05-18T05:10:11.5415349Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_linear/TEST-TestShardedTensorOpsLinear-20220518050823.xml (deflated 68%) 2022-05-18T05:10:11.5416233Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20220518050829.xml (deflated 71%) 2022-05-18T05:10:11.5417093Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20220518050829.xml (deflated 69%) 2022-05-18T05:10:11.5418003Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20220518050829.xml (deflated 66%) 2022-05-18T05:10:11.5418959Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_uneven/TEST-TestUnevenParamShard-20220518050835.xml (deflated 41%) 2022-05-18T05:10:11.5419766Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20220518050841.xml (deflated 51%) 2022-05-18T05:10:11.5420554Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_traversal/TEST-TestTraversal-20220518050846.xml (deflated 41%) 2022-05-18T05:10:11.5421414Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_embedding/TEST-TestShardedEmbedding-20220518050852.xml (deflated 60%) 2022-05-18T05:10:11.5422324Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_chunk/TEST-TestShardedTensorChunkOps-20220518050857.xml (deflated 60%) 2022-05-18T05:10:11.5423226Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20220518050902.xml (deflated 59%) 2022-05-18T05:10:11.5424234Z adding: test/test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallel-20220518050909.xml (deflated 83%) 2022-05-18T05:10:11.5425089Z adding: test/test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallelDeviceTypeCUDA-20220518050909.xml (deflated 85%) 2022-05-18T05:10:11.5425963Z adding: test/test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper/TEST-TestFlattenParams-20220518050914.xml (deflated 81%) 2022-05-18T05:10:11.5426827Z adding: test/test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper/TEST-TestFlattenParamsCUDA-20220518050914.xml (deflated 81%) 2022-05-18T05:10:11.5427807Z adding: test/test-reports/python-unittest/distributed.fsdp.test_flatten_params_wrapper/TEST-TestFlattenParamsCUDAHalf-20220518050914.xml (deflated 81%) 2022-05-18T05:10:11.5428662Z adding: test/test-reports/python-unittest/distributed.elastic.utils.logging_test/TEST-LoggingTest-20220518050918.xml (deflated 54%) 2022-05-18T05:10:11.5429478Z adding: test/test-reports/python-unittest/distributed.elastic.metrics.api_test/TEST-MetricsApiTest-20220518050920.xml (deflated 63%) 2022-05-18T05:10:11.5430241Z adding: test/test-reports/python-unittest/distributed.test_nccl/TEST-TestNCCLCUDA-20220518050925.xml (deflated 75%) 2022-05-18T05:10:11.5468737Z ##[group]Run seemethere/upload-artifact-s3@v4 2022-05-18T05:10:11.5469031Z with: 2022-05-18T05:10:11.5469269Z retention-days: 14 2022-05-18T05:10:11.5469520Z if-no-files-found: warn 2022-05-18T05:10:11.5469806Z path: test-jsons-*.zip 2022-05-18T05:10:11.5470065Z name: artifact 2022-05-18T05:10:11.5470301Z s3-bucket: gha-artifacts 2022-05-18T05:10:11.5470570Z region: us-east-1 2022-05-18T05:10:11.5470800Z env: 2022-05-18T05:10:11.5470998Z IN_CI: 1 2022-05-18T05:10:11.5471224Z IS_GHA: 1 2022-05-18T05:10:11.5471475Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:11.5471725Z GPU_FLAG: --gpus all 2022-05-18T05:10:11.5471974Z ##[endgroup] 2022-05-18T05:10:11.9755065Z With the provided path, there will be 1 file uploaded 2022-05-18T05:10:11.9755961Z Uploading to s3 prefix: pytorch/pytorch/2342799949/1/artifact 2022-05-18T05:10:11.9766577Z Starting upload of test-jsons-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459.zip 2022-05-18T05:10:12.1128773Z Finished upload of test-jsons-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459.zip 2022-05-18T05:10:12.1256782Z ##[group]Run seemethere/upload-artifact-s3@v4 2022-05-18T05:10:12.1257081Z with: 2022-05-18T05:10:12.1257302Z retention-days: 14 2022-05-18T05:10:12.1257585Z if-no-files-found: error 2022-05-18T05:10:12.1257869Z path: test-reports-*.zip 2022-05-18T05:10:12.1258112Z name: artifact 2022-05-18T05:10:12.1258366Z s3-bucket: gha-artifacts 2022-05-18T05:10:12.1258632Z region: us-east-1 2022-05-18T05:10:12.1258847Z env: 2022-05-18T05:10:12.1259064Z IN_CI: 1 2022-05-18T05:10:12.1259291Z IS_GHA: 1 2022-05-18T05:10:12.1259522Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:12.1259790Z GPU_FLAG: --gpus all 2022-05-18T05:10:12.1260041Z ##[endgroup] 2022-05-18T05:10:12.5439126Z With the provided path, there will be 1 file uploaded 2022-05-18T05:10:12.5439947Z Uploading to s3 prefix: pytorch/pytorch/2342799949/1/artifact 2022-05-18T05:10:12.5452115Z Starting upload of test-reports-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459.zip 2022-05-18T05:10:12.7504545Z Finished upload of test-reports-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482671459.zip 2022-05-18T05:10:12.7641038Z ##[group]Run set -x 2022-05-18T05:10:12.7641357Z set -x 2022-05-18T05:10:12.7641666Z python3 -m pip install -r requirements.txt 2022-05-18T05:10:12.7642010Z python3 -m pip install boto3==1.19.12 2022-05-18T05:10:12.7642392Z python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2022-05-18T05:10:12.7655791Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:10:12.7656093Z env: 2022-05-18T05:10:12.7656298Z IN_CI: 1 2022-05-18T05:10:12.7656530Z IS_GHA: 1 2022-05-18T05:10:12.7656784Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:12.7657034Z GPU_FLAG: --gpus all 2022-05-18T05:10:12.7657308Z AWS_DEFAULT_REGION: us-east-1 2022-05-18T05:10:12.7657572Z BRANCH: master 2022-05-18T05:10:12.7657871Z JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test 2022-05-18T05:10:12.7658201Z TEST_CONFIG: distributed 2022-05-18T05:10:12.7658462Z SHARD_NUMBER: 2 2022-05-18T05:10:12.7658804Z BUILD_ENVIRONMENT: linux-bionic-cuda10.2-py3.9-gcc7 2022-05-18T05:10:12.7659264Z PR_NUMBER: 2022-05-18T05:10:12.7659529Z SHA1: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T05:10:12.7659810Z TAG: 2022-05-18T05:10:12.7660045Z WORKFLOW_ID: 2342799949 2022-05-18T05:10:12.7660451Z GITHUB_TOKEN: *** 2022-05-18T05:10:12.7660723Z GHA_WORKFLOW_JOB_ID: 6482671459 2022-05-18T05:10:12.7660988Z ##[endgroup] 2022-05-18T05:10:12.7690248Z + python3 -m pip install -r requirements.txt 2022-05-18T05:10:13.0578813Z Defaulting to user installation because normal site-packages is not writeable 2022-05-18T05:10:13.0889120Z Ignoring dataclasses: markers 'python_version < "3.7"' don't match your environment 2022-05-18T05:10:13.0892750Z Requirement already satisfied: astunparse in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.6.3) 2022-05-18T05:10:13.0929289Z Requirement already satisfied: expecttest in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.1.3) 2022-05-18T05:10:13.0939994Z Requirement already satisfied: future in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.18.2) 2022-05-18T05:10:13.0951572Z Requirement already satisfied: numpy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (1.21.6) 2022-05-18T05:10:13.0963008Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (5.9.0) 2022-05-18T05:10:13.1098436Z Requirement already satisfied: pyyaml in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (6.0) 2022-05-18T05:10:13.1109329Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (2.26.0) 2022-05-18T05:10:13.1274319Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (49.1.3) 2022-05-18T05:10:13.1520636Z Requirement already satisfied: six in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (1.16.0) 2022-05-18T05:10:13.1532127Z Requirement already satisfied: types-dataclasses in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (0.6.5) 2022-05-18T05:10:13.1539777Z Requirement already satisfied: typing_extensions in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (4.2.0) 2022-05-18T05:10:13.1553966Z Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from astunparse->-r requirements.txt (line 2)) (0.37.1) 2022-05-18T05:10:13.1587898Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (2.0.12) 2022-05-18T05:10:13.1615898Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (1.26.9) 2022-05-18T05:10:13.1902324Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (2021.10.8) 2022-05-18T05:10:13.1912884Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (3.3) 2022-05-18T05:10:13.2517213Z + python3 -m pip install boto3==1.19.12 2022-05-18T05:10:13.5381220Z Defaulting to user installation because normal site-packages is not writeable 2022-05-18T05:10:13.5594194Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12) 2022-05-18T05:10:13.5662873Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2) 2022-05-18T05:10:13.5700006Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0) 2022-05-18T05:10:13.5717240Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12) 2022-05-18T05:10:13.5776961Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9) 2022-05-18T05:10:13.5992282Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2) 2022-05-18T05:10:13.6019998Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0) 2022-05-18T05:10:13.7138747Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2022-05-18T05:10:19.2207607Z [scribe] Scribe access token not provided, sending report via boto3... 2022-05-18T05:10:19.2207899Z 2022-05-18T05:10:19.2208266Z ----- Historic stats comparison result ------ 2022-05-18T05:10:19.2208507Z 2022-05-18T05:10:19.2208765Z job: linux-bionic-cuda10.2-py3.9-gcc7-test 2022-05-18T05:10:19.2209111Z commit: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T05:10:19.2209318Z 2022-05-18T05:10:19.2209529Z Commit graph (base is most recent master ancestor with at least one S3 report): 2022-05-18T05:10:19.2209775Z 2022-05-18T05:10:19.2209881Z : (master) 2022-05-18T05:10:19.2210087Z | 2022-05-18T05:10:19.2210354Z * 3b2375291a (HEAD) total time 3356.93s 2022-05-18T05:10:19.2210936Z * 6e3391a7c3 (base) 4 reports, total time 1395.65s ± 849.13s 2022-05-18T05:10:19.2211360Z * 48581d74ad 4 reports, total time 1369.92s ± 835.10s 2022-05-18T05:10:19.2211787Z * c35bd8d423 5 reports, total time 1366.08s ± 729.73s 2022-05-18T05:10:19.2212256Z * f6beda89c6 7 reports, total time 1559.82s ± 1628.51s 2022-05-18T05:10:19.2212696Z * ee080918df 9 reports, total time 2710.43s ± 2712.08s 2022-05-18T05:10:19.2212991Z * bbaefdf6b5 0 reports 2022-05-18T05:10:19.2213265Z * 7c52f204e0 0 reports 2022-05-18T05:10:19.2213526Z * e0451d8022 0 reports 2022-05-18T05:10:19.2213892Z * 4e2f5507d0 9 reports, total time 2696.76s ± 2644.11s 2022-05-18T05:10:19.2214314Z * b64845eb18 9 reports, total time 2712.94s ± 2654.45s 2022-05-18T05:10:19.2214587Z | 2022-05-18T05:10:19.2214789Z : 2022-05-18T05:10:19.2214927Z 2022-05-18T05:10:19.2215097Z Removed (across 576 suites) 0 tests, totaling 0.00s 2022-05-18T05:10:19.2215454Z Modified (across 0 suites) 0 tests, totaling 0.00s 2022-05-18T05:10:19.2215788Z Added (across 77 suites) 992 tests, totaling +3356.93s 2022-05-18T05:10:19.2741326Z Prepare all required actions 2022-05-18T05:10:19.2764121Z ##[group]Run ./.github/actions/teardown-linux 2022-05-18T05:10:19.2764408Z with: 2022-05-18T05:10:19.2764608Z env: 2022-05-18T05:10:19.2764840Z IN_CI: 1 2022-05-18T05:10:19.2765061Z IS_GHA: 1 2022-05-18T05:10:19.2765294Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:19.2765564Z GPU_FLAG: --gpus all 2022-05-18T05:10:19.2765810Z ##[endgroup] 2022-05-18T05:10:19.2782694Z ##[group]Run .github/scripts/wait_for_ssh_to_drain.sh 2022-05-18T05:10:19.2783051Z .github/scripts/wait_for_ssh_to_drain.sh 2022-05-18T05:10:19.2796558Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:10:19.2796862Z env: 2022-05-18T05:10:19.2797085Z IN_CI: 1 2022-05-18T05:10:19.2797290Z IS_GHA: 1 2022-05-18T05:10:19.2797542Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:19.2797809Z GPU_FLAG: --gpus all 2022-05-18T05:10:19.2798042Z ##[endgroup] 2022-05-18T05:10:19.2841939Z Holding runner for 2 hours until all ssh sessions have logged out 2022-05-18T05:10:19.2889065Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2022-05-18T05:10:19.2889488Z # ignore expansion of "docker ps -q" since it could be empty 2022-05-18T05:10:19.2889985Z # shellcheck disable=SC2046 2022-05-18T05:10:19.2890291Z docker stop $(docker ps -q) || true 2022-05-18T05:10:19.2890623Z # Prune all of the docker images 2022-05-18T05:10:19.2890937Z docker system prune -af 2022-05-18T05:10:19.2903512Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:10:19.2904090Z env: 2022-05-18T05:10:19.2904324Z IN_CI: 1 2022-05-18T05:10:19.2904543Z IS_GHA: 1 2022-05-18T05:10:19.2904806Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:10:19.2905088Z GPU_FLAG: --gpus all 2022-05-18T05:10:19.2905331Z ##[endgroup] 2022-05-18T05:10:19.7067289Z 04c3040422fc 2022-05-18T05:10:20.4045367Z Deleted Containers: 2022-05-18T05:10:20.4046070Z 04c3040422fca0f61bbc0b8d1c290660850f2b3df08e97daf38cf60eb8907ef4 2022-05-18T05:10:20.4046458Z 2022-05-18T05:10:24.4028936Z Deleted Images: 2022-05-18T05:10:24.4030257Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T05:10:24.4031837Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7@sha256:9737b662edb86afcd12a9367db6178a57889543632c0b710c5058abe14dc048f 2022-05-18T05:10:24.4032843Z deleted: sha256:914b650c5e1ee0f842697bbae2306dd6d831a4fa7fb861ca07bf056998b8539a 2022-05-18T05:10:24.4033641Z deleted: sha256:1034dda927c8a98e2c5d65a336554b89dbbe1e12c28d4d48b88e54f147a2e4e0 2022-05-18T05:10:24.4034402Z deleted: sha256:9daaaebd2559405012ffcc55a915a07af3fd8dfffccf3a4095f52a8d3a2a0808 2022-05-18T05:10:24.4034838Z deleted: sha256:62633a2457311070c286784502f87ac7817442880550ef46ef31086f62f63bd8 2022-05-18T05:10:24.4035247Z deleted: sha256:b40174881876c17fba8e4416c64d2b2065ba27f412e978c62446c6bd9975f43d 2022-05-18T05:10:24.4035646Z deleted: sha256:2b9f3cf2c41f5277698e8a3507d610f08552eb7289a4388f78a18f4934288b8c 2022-05-18T05:10:24.4036096Z deleted: sha256:bd03b60328b2f30ca7a665b612f2cc06f82974a2523f37e690f2eb32b20e23b1 2022-05-18T05:10:24.4036537Z deleted: sha256:9ead2207e8271970850e6a2fd7eacfc78f81c37d45b383107a12ba34b33a0068 2022-05-18T05:10:24.4036986Z deleted: sha256:03fe25e910ef9c726eef212a600805ba6fdd2cba133eec3a76ae6a62e71c50a8 2022-05-18T05:10:24.4037411Z deleted: sha256:42e9502eca4ade58460a090e6049a4c886d6667dc476a43c122110e9970e0504 2022-05-18T05:10:24.4037819Z deleted: sha256:3e18692fe2820772fe2b383c23571e3871b1e76e6ed758ca077a24e1fdae6a28 2022-05-18T05:10:24.4038249Z deleted: sha256:a9c1ea768838d14bfbdde1eb39006e75c504ef0e289e20b1cf1a0960ad20d993 2022-05-18T05:10:24.4038691Z deleted: sha256:653ed47cee104744163b9185cfc53ab6e751d141965b21a2f8bff4fb24acfd37 2022-05-18T05:10:24.4039097Z deleted: sha256:2ff0727ba124b0079c011424c629c2a5e27c5d7afb7b950b5513d4ab4f5e958d 2022-05-18T05:10:24.4039716Z deleted: sha256:4c3c43891ad25595b7374a30159f60ec584375dbc3820ecb30f5ad0374e5e86e 2022-05-18T05:10:24.4040165Z deleted: sha256:fa7d613a19e64cdd36a0c27fc6a2a50dd27c841da90bfae85e542064284ab2fd 2022-05-18T05:10:24.4040615Z deleted: sha256:22ec6f7d0cdf47c266dd9f601a0c98bd88bbd7e4ce3d21c9f7e00349cf7a0f8d 2022-05-18T05:10:24.4041080Z deleted: sha256:3ddedefb6de6867b92dc64bef9ed3206b098bcec87336ba702a4eec81de23bdf 2022-05-18T05:10:24.4041541Z deleted: sha256:6d2243fa3601d3ad6f7187388ef2f63d2eb318689d897e70fafdf33f22667537 2022-05-18T05:10:24.4041963Z deleted: sha256:8d2732c0f78444380cf8b5381c7b649a2e38315a0c11b8f03c7aab8f436d5390 2022-05-18T05:10:24.4042371Z deleted: sha256:85365c4faa86a33743f2107ccd2057705ec1aba1968cfeafbd737362b5499158 2022-05-18T05:10:24.4042799Z deleted: sha256:1aa2e018ba9609d32285b9d5ae5d41d884801742d27f0cbfcd249ab14b4bd4dc 2022-05-18T05:10:24.4043208Z deleted: sha256:3e096c567269719a45cda64f50eb9814c8bb7049822811461314641c8eb96c61 2022-05-18T05:10:24.4043610Z deleted: sha256:6c5ba201ed4d2056c53645f53d30efb8e4ba80fbea2c45042319090bd48d473c 2022-05-18T05:10:24.4044038Z deleted: sha256:64928ee816f9ae39d46f7dd36a5e45302562fd147967f6ab287a487c354b6b6c 2022-05-18T05:10:24.4044455Z deleted: sha256:a57b906a61d609815d662f4f4b65996a46514b07aea462793fd4143718ffc840 2022-05-18T05:10:24.4045022Z deleted: sha256:494612c761757956fbc4227e61a4a1e63e0f9b3372cf2430e2ee002ab523cfde 2022-05-18T05:10:24.4045455Z deleted: sha256:ff2c733048c22f423a6b20c35ff08bcbb6fe1bc76306464e654ef1ee28c3d861 2022-05-18T05:10:24.4045912Z deleted: sha256:4661c25de76163d8c2e45ca688f5b819c61c6c9f8e49ed83df44db353263f033 2022-05-18T05:10:24.4046364Z deleted: sha256:e23913a427ab6a1d96fc5ac9b9916776209427c2ef8eb9f44a4d16735f8c8494 2022-05-18T05:10:24.4046807Z deleted: sha256:832b3ad6407fa37ec6d8fd8f9d28172fe3bd5f6280fad98472d09eb0bc252ae0 2022-05-18T05:10:24.4047289Z deleted: sha256:a2b9dd02872fa4e35324d54aba02a6f1f21cb993714948cb709f94a2d85029f9 2022-05-18T05:10:24.4047755Z deleted: sha256:cb96bd5b78d181c6c2779f27e47036a8e9c3e1bcf09da94039148abd1c7d05ee 2022-05-18T05:10:24.4048240Z deleted: sha256:ea52cef0d0fe0c5edd5d235153b16fb0ce71bd0120ad33ed45f75bbfa3d9eadf 2022-05-18T05:10:24.4048702Z deleted: sha256:4fb97c7eb8955725be2bae74694a3af51e36e515a6c92a1aa75965cc09864f99 2022-05-18T05:10:24.4049161Z deleted: sha256:b2537994f751dde0a341c1f0d09a833be0150eb5a1cd60c7e65874442f6475a3 2022-05-18T05:10:24.4049840Z deleted: sha256:412f35baea526807361ea20e8f0e18576bdf2c6c40bdec402e94d86222a2b56e 2022-05-18T05:10:24.4050413Z deleted: sha256:cf621551bc4ed287124425a3d232f6c751dff14e9986bf7b7a697634d2f599bc 2022-05-18T05:10:24.4050849Z deleted: sha256:8003ff14feede16807731ad20c8151882bb62d724eb628e4c99ceaa2eea2a479 2022-05-18T05:10:24.4051278Z deleted: sha256:a1270a733ee0912cf66cd39d15f2ceace3789554b56647c5a5638b6ba73e3dab 2022-05-18T05:10:24.4051733Z deleted: sha256:a2811bdab35ec13d2eb84fdf4de75cbd29c5f6e227e4f11e9e8a9de714b7e132 2022-05-18T05:10:24.4052161Z deleted: sha256:f80e00922ecb54c1458a8c92d41e262173286ff550ed7468674de42de539714b 2022-05-18T05:10:24.4052596Z deleted: sha256:eb265251ed90e139bb4bfd41d9fa6a2cc6275eab106538fead323171069af9c9 2022-05-18T05:10:24.4053055Z deleted: sha256:fbee4dd8d443dcf0791e3965ee624b8ecc7b15d503ffbf8f2912d4d1d0a0cb47 2022-05-18T05:10:24.4053492Z deleted: sha256:e2f7d8e2982218fbc16adfe64b71e1839795e7a3ea82f5ff65336d58ae4cea0b 2022-05-18T05:10:24.4053937Z deleted: sha256:275df7d7943e762bf0a85fc2a9cd297c01ecb5d87ae4d86466c3f7f704d1c778 2022-05-18T05:10:24.4054382Z deleted: sha256:c2c5293df593b2d991852fe08e5db0f8c5d3c06b64247dc508084e747e64a42e 2022-05-18T05:10:24.4054813Z deleted: sha256:986cd2e7c143559516bc8388d5dd603eec6a1be4855c777c7e7f16bf22b9fa23 2022-05-18T05:10:24.4055232Z deleted: sha256:9d6787a516e72b7ed9422c8df1a4b298d82982bdf80ee1e198eedf1e1a010d76 2022-05-18T05:10:24.4055481Z 2022-05-18T05:10:24.4063318Z Total reclaimed space: 12.14GB 2022-05-18T05:10:24.4125481Z Post job cleanup. 2022-05-18T05:10:24.4161079Z Post job cleanup. 2022-05-18T05:10:24.5522028Z [command]/usr/bin/git version 2022-05-18T05:10:24.5573453Z git version 2.32.0 2022-05-18T05:10:24.5640008Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/0ba56b7e-8193-4c66-8177-09113ea95149' before making global git config changes 2022-05-18T05:10:24.5640590Z Adding repository directory to the temporary git global config as a safe directory 2022-05-18T05:10:24.5649588Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T05:10:24.5697443Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2022-05-18T05:10:24.5736278Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2022-05-18T05:10:24.6065070Z Entering 'android/libs/fbjni' 2022-05-18T05:10:24.6105467Z Entering 'third_party/FP16' 2022-05-18T05:10:24.6148607Z Entering 'third_party/FXdiv' 2022-05-18T05:10:24.6188689Z Entering 'third_party/NNPACK' 2022-05-18T05:10:24.6230418Z Entering 'third_party/QNNPACK' 2022-05-18T05:10:24.6271020Z Entering 'third_party/XNNPACK' 2022-05-18T05:10:24.6324160Z Entering 'third_party/benchmark' 2022-05-18T05:10:24.6365146Z Entering 'third_party/cpuinfo' 2022-05-18T05:10:24.6405821Z Entering 'third_party/cub' 2022-05-18T05:10:24.6446735Z Entering 'third_party/cudnn_frontend' 2022-05-18T05:10:24.6492902Z Entering 'third_party/eigen' 2022-05-18T05:10:24.6536868Z Entering 'third_party/fbgemm' 2022-05-18T05:10:24.6578831Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T05:10:24.6619391Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T05:10:24.6660031Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T05:10:24.6702145Z Entering 'third_party/flatbuffers' 2022-05-18T05:10:24.6748766Z Entering 'third_party/fmt' 2022-05-18T05:10:24.6790470Z Entering 'third_party/foxi' 2022-05-18T05:10:24.6830131Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T05:10:24.6870918Z Entering 'third_party/gloo' 2022-05-18T05:10:24.6911171Z Entering 'third_party/googletest' 2022-05-18T05:10:24.6951066Z Entering 'third_party/ideep' 2022-05-18T05:10:24.6992897Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T05:10:24.7034559Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T05:10:24.7082520Z Entering 'third_party/ios-cmake' 2022-05-18T05:10:24.7123090Z Entering 'third_party/kineto' 2022-05-18T05:10:24.7165011Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T05:10:24.7206919Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T05:10:24.7251245Z Entering 'third_party/nccl/nccl' 2022-05-18T05:10:24.7292167Z Entering 'third_party/neon2sse' 2022-05-18T05:10:24.7334320Z Entering 'third_party/onnx' 2022-05-18T05:10:24.7389835Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T05:10:24.7432237Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T05:10:24.7476589Z Entering 'third_party/onnx-tensorrt' 2022-05-18T05:10:24.7517195Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T05:10:24.7563224Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T05:10:24.7604650Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T05:10:24.7646536Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T05:10:24.7692106Z Entering 'third_party/pocketfft' 2022-05-18T05:10:24.7735282Z Entering 'third_party/protobuf' 2022-05-18T05:10:24.7780237Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T05:10:24.7821123Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T05:10:24.7864080Z Entering 'third_party/psimd' 2022-05-18T05:10:24.7905883Z Entering 'third_party/pthreadpool' 2022-05-18T05:10:24.7947822Z Entering 'third_party/pybind11' 2022-05-18T05:10:24.7990302Z Entering 'third_party/python-enum' 2022-05-18T05:10:24.8030366Z Entering 'third_party/python-peachpy' 2022-05-18T05:10:24.8071863Z Entering 'third_party/python-six' 2022-05-18T05:10:24.8113783Z Entering 'third_party/sleef' 2022-05-18T05:10:24.8154046Z Entering 'third_party/tbb' 2022-05-18T05:10:24.8198479Z Entering 'third_party/tensorpipe' 2022-05-18T05:10:24.8240042Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T05:10:24.8281398Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T05:10:24.8322799Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T05:10:24.8364863Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T05:10:24.8405032Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T05:10:24.8450130Z Entering 'third_party/zstd' 2022-05-18T05:10:24.8513281Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2022-05-18T05:10:24.8544098Z http.https://github.com/.extraheader 2022-05-18T05:10:24.8556264Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2022-05-18T05:10:24.8597201Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2022-05-18T05:10:24.8923672Z Entering 'android/libs/fbjni' 2022-05-18T05:10:24.8947563Z http.https://github.com/.extraheader 2022-05-18T05:10:24.8978898Z Entering 'third_party/FP16' 2022-05-18T05:10:24.9002928Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9034658Z Entering 'third_party/FXdiv' 2022-05-18T05:10:24.9058690Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9089837Z Entering 'third_party/NNPACK' 2022-05-18T05:10:24.9114913Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9147326Z Entering 'third_party/QNNPACK' 2022-05-18T05:10:24.9171067Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9202578Z Entering 'third_party/XNNPACK' 2022-05-18T05:10:24.9227852Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9272426Z Entering 'third_party/benchmark' 2022-05-18T05:10:24.9296198Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9327552Z Entering 'third_party/cpuinfo' 2022-05-18T05:10:24.9351621Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9383131Z Entering 'third_party/cub' 2022-05-18T05:10:24.9407523Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9439150Z Entering 'third_party/cudnn_frontend' 2022-05-18T05:10:24.9463476Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9501077Z Entering 'third_party/eigen' 2022-05-18T05:10:24.9526123Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9560638Z Entering 'third_party/fbgemm' 2022-05-18T05:10:24.9586171Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9617204Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T05:10:24.9641668Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9673198Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T05:10:24.9697136Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9729728Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T05:10:24.9754883Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9788553Z Entering 'third_party/flatbuffers' 2022-05-18T05:10:24.9812433Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9847910Z Entering 'third_party/fmt' 2022-05-18T05:10:24.9872534Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9903586Z Entering 'third_party/foxi' 2022-05-18T05:10:24.9928453Z http.https://github.com/.extraheader 2022-05-18T05:10:24.9959216Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T05:10:24.9983838Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0016671Z Entering 'third_party/gloo' 2022-05-18T05:10:25.0041421Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0073213Z Entering 'third_party/googletest' 2022-05-18T05:10:25.0097018Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0130478Z Entering 'third_party/ideep' 2022-05-18T05:10:25.0154841Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0186185Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T05:10:25.0209637Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0243173Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T05:10:25.0267376Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0305621Z Entering 'third_party/ios-cmake' 2022-05-18T05:10:25.0329334Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0360992Z Entering 'third_party/kineto' 2022-05-18T05:10:25.0385420Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0416533Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T05:10:25.0440753Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0472251Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T05:10:25.0495484Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0528884Z Entering 'third_party/nccl/nccl' 2022-05-18T05:10:25.0553921Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0585477Z Entering 'third_party/neon2sse' 2022-05-18T05:10:25.0608798Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0639899Z Entering 'third_party/onnx' 2022-05-18T05:10:25.0663628Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0707935Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T05:10:25.0731575Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0762674Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T05:10:25.0788007Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0821005Z Entering 'third_party/onnx-tensorrt' 2022-05-18T05:10:25.0845362Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0876162Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T05:10:25.0900175Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0936891Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T05:10:25.0961565Z http.https://github.com/.extraheader 2022-05-18T05:10:25.0993673Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T05:10:25.1017075Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1048434Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T05:10:25.1072893Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1109037Z Entering 'third_party/pocketfft' 2022-05-18T05:10:25.1133406Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1165945Z Entering 'third_party/protobuf' 2022-05-18T05:10:25.1190253Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1226363Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T05:10:25.1249702Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1281555Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T05:10:25.1306165Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1339335Z Entering 'third_party/psimd' 2022-05-18T05:10:25.1363403Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1395610Z Entering 'third_party/pthreadpool' 2022-05-18T05:10:25.1419574Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1451590Z Entering 'third_party/pybind11' 2022-05-18T05:10:25.1475519Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1507799Z Entering 'third_party/python-enum' 2022-05-18T05:10:25.1531472Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1562665Z Entering 'third_party/python-peachpy' 2022-05-18T05:10:25.1587409Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1619070Z Entering 'third_party/python-six' 2022-05-18T05:10:25.1643109Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1674882Z Entering 'third_party/sleef' 2022-05-18T05:10:25.1698725Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1730790Z Entering 'third_party/tbb' 2022-05-18T05:10:25.1755786Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1789518Z Entering 'third_party/tensorpipe' 2022-05-18T05:10:25.1813716Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1845334Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T05:10:25.1869567Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1900359Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T05:10:25.1924303Z http.https://github.com/.extraheader 2022-05-18T05:10:25.1955385Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T05:10:25.1978939Z http.https://github.com/.extraheader 2022-05-18T05:10:25.2011150Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T05:10:25.2034886Z http.https://github.com/.extraheader 2022-05-18T05:10:25.2065805Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T05:10:25.2089306Z http.https://github.com/.extraheader 2022-05-18T05:10:25.2124240Z Entering 'third_party/zstd' 2022-05-18T05:10:25.2149395Z http.https://github.com/.extraheader 2022-05-18T05:10:25.2447395Z Cleaning up orphan processes